Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
Noy Davidson
Noy Davidson Author

Noy Davidson is a BI Developer at Treeverse, the company...

Published on February 12, 2024

Every year, poor data quality costs companies an average of $12.9 million. Aside from the immediate impact on income, low quality data complicates data ecosystems and contributes to poor decision-making in the long run. 

In a world where data is the most central asset of a company, used both for operational and strategic purposes, we constantly face the risk of harming our business when relying on poor-quality data for decision making or operating.

Data should always reflect reality. If it fails to do so, it may lead to inaccurate decision-making. This often results in inefficient business operations and, in the worst-case scenario, even puts the company at risk.

That’s why data quality is of paramount importance to any organization with data-driven processes. But what exactly are the risks businesses with poor data quality face? Let’s dive into them to understand the value of high data quality for practically every aspect of business operations.

The impact of poor data quality on business

Data quality and business operations

Poor-quality data incidents can present a significant risk to companies. Data teams usually deal with several types of data incidents. 

Some no longer get noticed and become assimilated into our daily routine. Others are the kind of incidents that are small enough, so they don’t get attributed to an actual failure. Their effect on data is often categorized as random bias. 

And then there are other data incidents, ones that are so significant that they end up causing the loss of millions of dollars – or may even put human life at risk.

Here are two examples of operational issues caused by poor data quality:

1. NASA’s Mars Climate Orbiter

Back in 1998, NASA lost $125 million on its Mars Climate Orbiter because of the difference in measuring units. Lockheed Martin helped NASA build, develop, and operate the spacecraft. Its engineering team used English units of measurement for a critical spacecraft function while NASA’s team used the more common metric system.

The unit mismatch prohibited navigation data from being sent between the Mars Climate Orbiter spacecraft team at Lockheed Martin in Denver and the flight team at NASA’s Jet Propulsion Laboratory, causing the loss of the Climate Orbiter.

2. Amsterdam’s tax office blunder

Amsterdam’s tax office once gave out €188 million to around 10,000 households in yearly government rent subsidies instead of €2 million. The reason? Amsterdam’s government software calculated payments in cents instead of euros.

What’s interesting is that no one at Amsterdam’s tax office seemed to have noticed the mismatch. On top of that, the city spent some €300,000 more on trying to understand and resolve the matter.

These two examples show why data consistency is so important. Companies that collect data from multiple systems or collaborate with third-party providers need a strong data quality framework that helps make the data uniform, accurate, and consistent across multiple databases, systems, and applications within a company. 

Data quality and business strategy

How does data quality impact the strategic aspects of running a business? How can stakeholders be sure that the data is reliable for decision making?

In the case of everyday data-driven decisions, making the wrong decision due to bad data quality may go unnoticed. Especially if it happens only once in a while.

But when errors become a frequent occurrence and hide under the unknown unknowns, the drama begins. We might understand the source of the error very quickly in some cases. In others, it may remain a mystery indefinitely. 

This means that the company faces issues such as loss of time, resources, and opportunities – all of which translate into money. Making mistakes is costly – and so is analyzing their root causes. But a wrong strategic turn based on poor-quality data isn’t just a loss of resources. In some cases, it might be a fateful turning point for the business.

Each data set is vulnerable to data quality issues, especially when referring to big data generated at high velocity and streamed into our data lake. Poor data quality may lead to errors that bring about poor customer interactions, erroneous analytics, and inappropriate decisions, all of which impair corporate success.

What causes poor data quality?

Here are the primary causes behind poor-quality data:   

  • Data integration issues – Conversion problems can occur when data is collected from many databases that aren’t integrated with the organization’s database. Converting one data format to another frequently leads to errors. Conversion challenges can get much more complex when data from an older legacy system is converted for storage in a NoSQL system.
  • Data decay – The decline of data quality often occurs in the marketing and sales departments (this is also called data deterioration or data degradation).
  • Poor data migration – This issue occurs when data is transferred from a legacy system to a new database or the cloud. Moving data to a new system involves various risks: some of the data values may be missing or data may be corrupted.
  • Data duplication – Duplicate data might lead to issues when used for statistical purposes.

The good news is that data quality is measurable, and you can improve it using solid data quality management best practices, including data quality monitoring. This helps build trust in data across all consumers, as well as reduce the uncertainty of the data quality. Provide teams data they can rely on when making decisions.

6 dimensions of data quality

Data quality can be measured against specific data quality dimensions. Depending on the unique nature of your business operations, strategy, and industry, these may take different forms. 

Once you establish these dimensions, you’re ready to line up the data quality metrics that will apply to your case.

Here’s a short overview of data quality dimensions, together with examples:

1. Completeness

Data is considered “complete” when it includes all of the required information. Imagine that you ask a consumer to submit their name. You may make the middle name optional, but the data is complete as long as you have their first and last names. You should assess whether all of the required information is available and whether any components are missing. 

2. Accuracy

This is an important aspect of data quality because it demonstrates how well the information represents the event or item portrayed. For example, if a consumer is 32 years old but the system believes they are 34, the information is wrong. 

What can you do to increase your accuracy? Consider if the data truly reflects the situation. Is there any erroneous information that needs to be corrected? 

3. Consistency

The same data may be stored in several places across different teams. If the information matches, it’s considered “consistent.” Consistency is an important data quality factor, especially in situations where there are many data sources.

For example, if your human resources information systems show that an employee has left the company but your payroll system shows that they are still getting a paycheck, your data is inconsistent.

To address inconsistency problems, inspect your data sets to ensure that they are the same in every instance. Is there any proof that the data contradicts itself?

4. Validity

Data validity is a dimension that indicates how well data matches business requirements or follows a certain format. Birth dates are a popular example, as many systems require you to enter your birth date in a specific manner. If you don’t do it, the data will be invalid. 

To meet this data quality dimension, you must guarantee that all of your data follows a specific format or set of business standards. 

5. Uniqueness

“Unique” data is data that only appears once in a database. We are all aware that data duplication occurs frequently. For example, it is likely that “Daniel A. Lawson” and “Dan A. Lawson” are the same person, but in your database, they will be classified as separate entries.

Meeting this data quality factor requires ensuring that your data is not duplicated.

6. Timeliness

Is your data easily available when it is needed? This data quality dimension is referred to as “timeliness.” Assume you want financial data every quarter; if the data is provided when it is expected, it may be deemed timely.

The timeliness component of data quality refers to specific user expectations. If your data isn’t available when you need it, it doesn’t fit that criteria.

Wrap up

Clean, consistent, conformed, current, and comprehensive. The five Cs of data apply to all forms of data, large or small. Your data processing should include checks for all of these dimensions.

Check out this guide to data quality monitoring to get an overview of the processes and tools that help to improve data across entire organizations.

Git for Data – lakeFS

  • Get Started
    Get Started