Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros
The lakeFS Team
The lakeFS Team Author

lakeFS is on a mission to simplify the lives of...

Last updated on August 5, 2025

Have you ever heard the phrase “data is the new oil” before? Of course, you have. As more businesses use and benefit from data, it becomes a valuable currency. Using data, a company can plan and create strategies for the future to come. These plans may include drastically improving the efficiency of a business’s marketing toward its customers or boosting the company’s product quality to reach a larger audience. Utilizing comprehensive data about your surrounding environment and consumers is the key to improving your sales, product quality, or marketing strategies.

Companies build data products and applications that collect, analyze, present, and derive insights from data. The term is important as it implies that best practices for delivering data should apply when creating data-intensive applications.

What Are Data Products?

A data product is any tool or application that processes data and generates insights. These insights help businesses make better decisions for the future.

Users or customer organizations can then sell or consume the stored data, processing it as needed. Take the Netflix movie recommendations algorithm as a real-life data product example. Depending on the data previously stored regarding a given viewer’s most liked and disliked movies or shows, further suggestions of similar entertainment can be presented for the viewer.

Data Product vs Data as a Product

“Data as a Product” is a philosophy or strategy in which data is treated as a product, with an emphasis on quality, usability, and end-user needs, just like any other product given by a business. A “data product,” on the other hand, is a concrete output or service developed with data, such as a dashboard or a predictive model, focusing on data applications.

Examples of Data Products

Companies can use specific techniques to process highly valuable data further, depending on their desired goal. This processing allows for the extraction of customer information for major future uses. While data products have multiple categorizations, they are most commonly characterized by how they process data and the type of service they provide.

Breakdown of Data Services

Raw data

This is the most basic form of data collected into the system. Raw data indicates that it has not undergone any processing or usage yet. Companies can still process this data to extract more value from it.

Example of raw data: All the purchases at a store for a given period of time.

Derived data

This is a more customized version of raw data where further steps are taken to make it more understandable, like calculating the average or sum of a given attribute.

Example of derived data: Calculating the average purchase of a customer.

Algorithms

A dedicated algorithm processes the data. The purpose of an algorithm is to provide a final end result for data customers. Some of these algorithms can require running a machine learning model on the given data. From this category rise the two data products: decision support and automated decision making.

Decision support

While decision support algorithms can do a good part of the processing, the digested answer is returned to the user to make inferences from processed data. This means that further human intervention is required for it to function effectively.

Example of decision support: GPS navigation application.

Automated decision-making

This method can be thought of as a do-it-all algorithm. These kinds of data products will give a final result without needing external help from the service user.

Example of automated decision-making: A self-driving car.

Types of Services Data Products Provide

Automated decision-making or data-enhanced data products

This category includes data products that help make business decisions without human intervention. Algorithms that recommend products or services depending on the customer’s previous likes and dislikes come here. Some examples of such services are recommending shows to a Netflix viewer or recommending products on Amazon. 

We can offer more customized product suggestions to each unique customer by studying their older interest patterns. Using such tactics bumps up future sales.

Data as a Service Products

Data-as-a-service products offer a service customers can subscribe to by connecting to a given API. We typically create these services by applying various machine learning models to the provided data. Customers typically purchase these services to integrate them into their applications and websites. 

Some examples include weather applications or geography data services. These can provide information about atmospheric pressure, weather conditions, visibility distance, relative humidity, and precipitation in different units and parts of the world. They can also provide information about geographic boundaries and maps.

Data as Insights

In the final category of data products, data does not generate revenue alone; it is used to improve business insights and, in turn, product sales. In this category, customers do not view the data. 

Data insight products allow companies to optimize processing to improve performance, uncover new markets, products, or services to add new revenue sources, better balance risk vs. reward to reduce loss and deepen customer understanding to increase loyalty and lifetime value. One example of a data insight service is Facebook, which collects customer data for future offers.

Facebook can use its stored data to gain insights about your lifestyle. Based on these insights, each customer can receive a customized offer.

Data Product Characteristics

Characteristic Description
Discoverable Data products must be maintained in a registry with suitable metadata descriptions that allow consumers to search quickly.
Quality Before putting data products into production, the team should invest in current data quality methodologies to detect and correct errors.
Secure Using self-service analytics requires security in two ways: giving access and permission only to the right people and following data protection rules like HIPAA and GDPR for sensitive personal information.
Observable Unlike software applications, data continually changes, causing "anomalies" like schema modifications, late and out-of-order data, or data entry errors. In addition, faults in the pipelines and infrastructure may cause some jobs to fail and go undiscovered for an extended period.
Operational DataOps is a critical tool for delivering practical, agile data engineering. Its features include automation, low/no-code development, continuous integration, testing, and deployment.

Data Product Design Recommendations

Collect data passively

Collecting user data should never degrade the user experience. Despite the unquestionable importance of privacy, customers have increased expectations about the use of their personal data in exchange for improved experiences.

Avoid exhausting users

No data exists until a user interacts with your product, making customization impossible. Active data collecting techniques can help your product overcome the cold start problem but must be integrated into the user experience. Successful data solutions overcome this by providing consumers with an easy and engaging onboarding experience that collects the essential data without being unduly demanding.

Constantly validate using data

Launching your data product is just the start. Once people begin to engage, you must continuously validate your data product by tracking key quantitative indicators. The world is constantly evolving, and what works now may not work tomorrow. Furthermore, tracking key metrics allows you to conduct experiments, or A/B tests, to improve the performance of your data product.

Give users control

An excessively enthusiastic machine learning system that makes too many decisions, no matter how accurate, would confuse and frustrate users. However, achieving the ideal balance between anticipating demands and offering users the appropriate level of control can be difficult.

Data Product Build Strategy

This part will explain Zhamak’s Principles for Data Products and data product architectures

Zhamak’s Principles for Data Products

So, who is Zhamak Dehghani? Zhamak Dehghani is the esteemed founder of the data mesh concept. She defines it as “a sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments—within or across organizations.”

Zhamak states that to define a product as a data product, the data product should check off the main principles regarding the data being used. These principles are critical for data products’ eligibility for general use. Some of these principles, as stated by Zhamak, include:

  • Discoverability: The discoverability of a data product is of the utmost importance. Information such as the data owner, source of origin, location, and quality metrics are necessary for a discoverable product. Such information can improve the data’s trustworthiness and security.
  • Data must hold value in itself: The stored data should be valuable to other organizations. It is better not to collect any data that will not be sold or used for future profits because storing such data comes with its own storage and maintenance costs.
  • Trustworthy and trustworthy: Businesses rely on data to make significant decisions, so the data needs to be highly trustworthy. Verifying the trustworthiness of any collected data before using it as a source is imperative. So, how can you check for the trustworthiness of your data? Continuous checks on your datasets will be required. These checks will examine the data quality and accuracy for future use.
  • Understandable data: To be easily used, it is necessary to understand how data is organized and structured. Understanding such syntax will allow you to extract the required data from a database table without further outside support or help.
  • Accessible: Since most data collected will likely be processed somehow, how different users can access it is crucial. Different data formats will be required for other data usage and processing. For example, data used to run a machine or deep learning models will require different formats than data required to build a graph to extract common trends or patterns among users.
  • Governed security access to the product: Access to a data product must be governed by a certain entity, as the given data must not be accessible to everyone.
  • Addressable: It is as simple as it seems—each data product requires a unique address so users can easily identify and use it.

Data Product Architecture

As you can imagine, the main components of any data product include the code, the valuable data stored with any additional data describing it, and the product’s infrastructure.

Here are the components of a data product: 

  • Code: Code includes the code for consuming data pipelines, APIs that provide data access and explanations for the provided schema, and the code responsible for enforcing data access control, compliance, and provenance.
  • Data and metadata: This is the most important part of our structure. Here, the actual data sets that will be used are stored. The metadata will also be provided. The metadata should include documentation, semantic and syntax declarations, and quality metrics.
  • Infrastructure: It is used to run and deploy the data product while storing the required data.

Data Product Lifecycle Key Steps

1. Conceptualization

Identify business needs. Understand the data product’s purpose and the users it will serve. Clearly describe what the product will do and the expected consequence. Determine the exact data, functions, and user experience requirements.

2. Design

Plan the data product’s architecture. Decide on data sources, processing methods, and storage options.
Create an easy and intuitive interface for users to interact with the product. Set the terms and conditions, SLAs, and KPIs for the data product.

3. Development

Create the data product’s components, the required code, models, dashboards, and other artifacts. Test the data product to ensure it meets the specified standards and functions as intended, providing the expected value and satisfying user expectations.

4. Deployment

Make the data product available to users via a data catalog or other platform. Assign ownership and governance. Create explicit roles and responsibilities for managing the data product. Make the data product easy to discover and use by its target audience.

5. Maintenance and Monitoring

Monitor usage, quality, and the performance of the data product. Implement release management to manage long-term adjustments and upgrades to the data product. Continuously develop the data product in response to customer feedback and changing needs.

6. Retirement

Inform users that their data product is no longer supported. Archive the product, preserve data product artifacts for historical use, and clean up system resources by removing the data product.

Data Product Use Cases

Business decision-making

Data products enable decision-makers to swiftly access relevant information and insights. They enable non-technical users to see beyond the intricacies of data pipelines to spot patterns, opportunities, and potential hazards that affect strategic planning.

Automation

They can automate repetitive procedures and processes, saving time and money. Machine learning, artificial intelligence, and APIs can be used to automate processes like document classification and customer assistance, enhancing efficiency and accuracy.

Real-time monitoring

Real-time data products are vital for monitoring key systems and quickly finding anomalies or emergencies. Monitoring data from power plants, transportation networks, and healthcare equipment, for example, can help ensure quick reactions.

Personalization

They tailor experiences for users or customers. For example, recommendation systems for e-commerce platforms use machine learning models to recommend products based on individual tastes and historical behavior.

Customer insights

Businesses employ data products to gain insights into customers’ behavior, preferences, and needs. This data is used to better understand consumer categories, increase customer happiness, and build focused marketing strategies.

Performance optimization

Data products can improve processes and systems. For example, predictive maintenance models can use sensor data to predict equipment breakdowns, lowering downtime and maintenance expenses.

Competitor analysis

Data products can analyze competitor data to help firms find market strengths, weaknesses, and opportunities. This competitive knowledge helps refine business tactics.

Why Think of Your Internal Data Operations as a Product?

1. Product Management Methodology

Regardless of your delivery methodology or the agile development flavor you prefer, every product has a product manager responsible for gathering the customers’ needs, defining the functionality to deliver, and prioritizing the work for the development teams.

Having such a function for your data products is critical. While it is more common to use data products as part of a company’s business, it is seldom done when the data product is for internal use of insights to improve the business. Data engineering teams work without a guiding hand from the business stakeholders, so requirements are low quality, and prioritization is often missing. Once you treat your internal data operations as a product, you assign a product manager.

2. Engineering Best Practices

Data engineering teams should be able to leverage application development best practices. To do so, they need the tools and processes that enable CI/CD for their data products, including their code (DevOps practices cover that). Their data and infrastructure should also be part of the CI/CD process. Similar to the code, the data should undergo versioning and control, and popular technologies like K8s should facilitate the easy structuring of the infrastructure. The world of data is moving in this direction, allowing more and more data technologies on K8s and version control engines for data.

3. Measure and Improve

Product delivery is measured to ensure the velocity of development teams, Feature implementation vs. planning, and quality is measured by bug fixes. With data products, all those measures still apply. Still, data quality, in all its aspects, must also be monitored, and SLA agreements should include not only the availability of the interface but also the freshness of the data.

Future of Data Products

With the total amount of data created and consumed reaching 64.2 zettabytes in 2020 and an estimated 50,000–500,000 zettabytes of data being generated by 2050, bigger and more diverse data sets will need to be created soon. These datasets will allow users to easily identify patterns and continuously improve data products.

Of course, we can not overlook the opportunities that artificial intelligence and data science fields are creating for new data products. As different organizations are starting to acknowledge the merits of rising technologies like artificial intelligence, machine learning, and deep learning, companies such as Amazon, Google, Alibaba, and Microsoft are investing a huge amount of capital into their artificial intelligence departments to improve such concepts even further.

Conclusion

Some of the main concepts of any data product include its types, examples, principles, components, and much more. With data changing how we look at products and services, who knows what new developments may occur in the coming year or two?

Now, you should better understand data products if you plan on creating, buying, or working with such concepts in the near future.

Frequently Asked Questions

The four main categories of data pieces are nominal, ordinal, discrete, and continuous. These categories assist in understanding and evaluating various forms of data.

A successful data product must be discoverable, comprehensible, and trustworthy. It must also have clear metadata, be tailored to certain business domains, and be easily connected with other systems. It should reflect a cohesive information notion, with independent value and strong access controls that ensure only authorized people have access.

Yes, an API can be considered a data product, particularly if it provides access to structured data and is designed, managed, and used as such. APIs are essential for the data product landscape because they enable data delivery to users and applications.

A data product is a packaged solution that uses data analysis and insights to provide value and make decisions. In contrast, a dashboard is a visual interface that displays essential data metrics and trends. Despite being a type of data product, dashboards primarily provide a high-level overview and performance monitoring. In contrast, other data products may offer advanced features like predictive analytics or automated data-driven actions.

In a data mesh architecture, a data product is a data unit handled as a product, owned and controlled by a domain team, and made available for consumption by other organization sections. It is a self-contained, reusable data asset created for a specific use case, complete with defined interfaces and maintenance cycles.

lakeFS