Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

8 Data Quality Metrics to Measure Data Quality

Idan Novogroder
Idan Novogroder Author

Idan has an extensive background in software and DevOps engineering....

January 30, 2024

Organizations collect, store, and process data propelled by the vision of smarter, more accurate, and fully data-driven decision-making. But to achieve that, they need to meet several prerequisites, such as setting up a way to collect data, developing data literacy among business teams, and – perhaps more importantly – have confidence and trust in data.

If your data consumers don’t trust your data, they may be permanently turned off from using it to support their decisions.

This is where data quality can help. The term is rather broad and encompasses all of the aspects that impact whether data can be trusted for its intended purpose. Enhancing data quality often feels like moving a rock up a hill just to have it roll back down on you – and you’ll be moving that boulder forever. 

But here’s some good news: with the correct measures, tactics, and procedures, that workload can shrink, the slope can get flatter, and you can get stronger.

The best way to improve anything starts with knowing where you are and measuring data quality and your progress. What you need is continuous data quality monitoring that focuses on the right data quality metrics. 

Find out what they are, how to measure them, and how to put data quality metrics to practice in your organization.

measuring data quality

What are data quality metrics?

Data quality metrics are indicators teams use to measure data quality. They help to differentiate between high quality data and low quality data. 

But the above is just one of the many benefits. Data quality measurement also brings the following advantages:

  • Identifying areas for improvement – Quality metrics show you where you can enhance product quality and performance by bringing issues such as high failure rate or a poor customer happiness rating to light.
  • Understanding process effectiveness – Quality metrics such as tracking delivery times or task times help teams understand if their current processes are successful.
  • Developing solutions – In industries such as manufacturing, assessing quality helps you fine-tune your approach, for instance, leading to investment in different materials to minimize the number of errors.
  • Cost reduction – Data quality issues increase costs by necessitating rework, more QA, and the payment of customer warranties. By evaluating quality metrics, you can spot problems early on, deliver higher-quality products, and lower long-term expenses.

Data quality dimensions vs. data quality metrics vs KPIs

Data quality dimensions are categories of data quality concerns that are relevant and frequently have comparable underlying causes.

Data quality metrics explain how a dimension is measured explicitly, either quantitatively or qualitatively, and may be followed over time.

Data quality KPIs (Key Performance Indicators) are data that reflect how effective you are at meeting your company objectives.

Let’s explain the difference between data quality dimensions and metrics using the example of timeliness. 

When you assess timeliness as a data quality dimension, you’re likely to ask the following questions:

  • Why is the data in my business intelligence tool not up to date?
  • Why does it take so long for my dashboard to refresh?

These concerns shape your timeliness metrics, such as:

  • Number of hours in which the service level agreement was not satisfied
  • The difference between dashboard access time and the most recent data refresh time 
  • The average time between ELT load and reverse ETL operationalization

8 metrics to assess data quality

1. Data to errors ratio

This ratio provides a clear approach to assessing data quality. It involves measuring the number of known data errors, such as incomplete, missing, or redundant entries, within a data collection. Next, that number is put in relation to the total size of the data set. 

When you uncover fewer mistakes while the size of your data remains constant or increases, you know your data quality is increasing. 

The problem with this strategy is that you might make data errors that you’re unaware of. If you take this into account, using only the data-to-error ratio is tricky because it might present an excessively optimistic perspective on data quality.

2. Number of empty values

Empty values frequently suggest that critical information is missing or that it was recorded in the incorrect field. This is a reasonably simple data quality issue to track. 

All you need to do is count the number of entries in a data set that have empty fields, and then track that figure over time. 

Of course, it makes sense to concentrate on data fields that add considerably to the total value. For example, an optional memo field may not be a useful predictor of data quality. But an important value, such as a zip code or phone number, adds more to the general completeness of data sets.

3. Data transformation errors

Problems with data transformation – the process of taking data stored in one format and translating it to another – are frequently indicative of data quality issues. If a necessary field is null or includes an unexpected value that doesn’t correspond to business requirements, the transformation process is likely to fail. 

You can learn about the overall quality of your data by counting the number of data transformation operations that fail (or take too long to complete).

4. Amount of dark data

Dark data is data that your company collects and stores but doesn’t use in any way. Large amounts of dark data are often an indicator of underlying data quality issues since no one bothers to look at it. 

Many companies still don’t have a full understanding of the potential worth their existing data carries. If you wish to use that data, now is the moment to bring it out of the closet and assess its correctness, consistency, and completeness.

5. Data storage costs

Are your data storage prices increasing while the amount of data you use remains constant? This is frequently a sign of poor data quality. 

If you’re keeping data without using it, it’s possible that the data has quality issues. If, on the other hand, your storage costs fall while your data activities remain stable or rise, you’re likely increasing data quality.

6. Data time-to-value

How quickly can your team convert data into business value? The response might disclose a great deal about the overall quality of your data. 

If data transformations produce a high number of mistakes or if human involvement and manual cleanup are necessary, this may indicate that your data quality isn’t as good as it should be. This means that it’s high time you started developing a solid data quality framework.

7. Email bounce rates

Sales and marketing efforts can only be successful if you have a high-quality email list to work with. Customer and prospect data may degrade quickly, resulting in low-quality data sets and campaigns that perform poorly. 

One of the most prevalent reasons behind email bounces is poor data quality. They occur when you send emails to the incorrect addresses due to inaccuracies, missing data, or obsolete data. 

8. Cost of quality

Finally, there’s one metric for every team that invests in data quality management activities.

Having high quality data is the holy grail of every data practitioner. But we all operate in the context of organizations with the core focus of driving revenue and growing their business, not polishing their data sets. 

It’s essential that you’re able to show the value your investments in data quality have for the business. Perhaps some data quality metrics are more mission-critical than others? The metrics you focus on will always depend on your organization’s unique requirements.

How to improve data quality using data quality metrics?

Poor quality data takes on many different forms, but there are some common denominators you can focus on when improving data quality. One useful framework for organizational operational development is People, Process, Technology (PPT). 

Here are a few best practices that derive from this framework and others to help you improve data quality using the data quality metrics you’ve been tracking.

Hold your team accountable for data quality metrics

How can your team members help you in developing data quality metrics, holding the company responsible for reaching those criteria, and contributing to high-quality data? 

For example, the engineering team can use pull request reviews to limit breaking changes to upstream systems that could affect the stability of your downstream data products. 

To directly own the adoption and improvement of data quality measures, it’s worth engaging various stakeholders beyond the data team, even reaching out to business teams.

Implement business processes for data quality improvement

Another important point relates to business processes for increasing data quality. You can perform a one-time data quality assessment. Or you can focus on ongoing improvement by implementing quarterly OKRs around data quality metrics and metric scorecards. 

Also, marketing and sales teams stand to benefit a lot from training sessions on data entry, which can ultimately impact data accuracy and validity.

Use data quality tools

Finally, it’s a good idea to make use of technology solutions that help you improve quality at every stage of the data lifecycle. The market for data quality tools is full of solutions for measuring data quality, as well as data cleansing tools for correcting data values. 

How to put data quality metrics into practice

1. Identify your unique pain points around data quality

Which data quality concerns have lately caused the greatest challenges for your company? Perhaps you can pinpoint something specific, like:

  • Customer data not getting refreshed when a sales forecast is created
  • Dashboards that are difficult for business users to interpret 
  • Calculations not being up to date with the most recent business metrics

Once you understand your pain points, you can move on to picking the most efficient tactics to deal with them and increase the quality of your data.

2. Define use cases specific to your organization

This point is closely related to the previous one, where you defined your pain points. Apart from documenting your data challenges, make sure to understand your goals for specific use cases related to data.

Here are a few examples:

  • Your use case is sales forecasting for quarterly board meetings – your goal here would be to deliver updated reports based on complete data
  • Your use case is supporting marketing efforts around emailing – you goal is to develop a database of customer records that are up-to-date and complete

This is how you connect your data quality metrics to what matters most and tell which stakeholders should be involved.

3. Connect to data quality dimensions

Next, it’s time to connect your data quality dimensions to the metrics you’ll be tracking to assess them.

Let’s take the dimension of timeliness as an example.

How are you going to measure it? Here are a few example metrics: 

  • The mean time difference between the most recent refresh of the data within the dashboard and the dashboard’s access time
  • A quarterly survey of executives asking them to rate the usability of the data in their dashboards 
  • The number of outputs shown on the dashboard compared to a set of tests that independently apply business rules like “the net revenue is more than 0.”

4. Describe how to measure the metrics

Depending on their business goals, organizations may assess quality metrics differently. Here are a few best practices teams use to stay on top of these metrics:

  • Determine the quality factor to be measured – For example, you can track the time it takes to respond to client inquiries or the number of error messages a customer receives. Consider controllable factors since the results may recommend changes to requirements, methods, or products.
  • Determine the scope – Data quality metrics can be tracked for a single feature or an entire range of projects. Determining the scope of your measurement early on is important to keep the data quality initiative focused on what matters most.
  • Keep track of changes – Data collected is frequently a snapshot in time. Consider tracking changes over time to determine the performance of your quality metrics. 

5. Make metrics actionable and easy to use

It’s good to have data quality metrics, but even better to make them useful for the stakeholders who are trying to get a clearer picture of data quality and take action. Visualizing your data quality indicators might aid in their presentation and communication. 

Data visualization improves the understandability, engagement, and actionability of your data quality metrics. It can also assist you in identifying patterns, trends, outliers, and abnormalities in the quality of your data. Select the visualization approaches that will best explain your data quality metrics. To demonstrate the distribution of data quality measures, for example, you can use bar charts, pie charts, or histograms.

Conclusion

The industry in which your company operates, the nature of your data, and the part it plays in achieving your objectives are just a few of the variables that affect the choice of data quality dimensions and metrics.

Because each industry has its own set of data rules, reporting mechanisms, and measurement criteria, you can use a different set of data quality metrics to meet the needs of each case. The choice of the right metrics will impact your entire data quality assessment effort, helping you to spot data quality issues early on, before they snowball into massive problems that affect the business.

Looking for more practical steps to take on the way towards data quality? Head over to our guide to data quality monitoring.

Git for Data – lakeFS

  • Get Started
    Get Started
  • Who’s coming to Data+AI Summit? Meet the lakeFS team at Booth #69! Learn more about -

    lakeFS for Databricks
    +