Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Data Engineering

Data Engineering Machine Learning Product

How lakeFS Helps Ensure Data Compliance

Tal Sofer

Data compliance is all about adhering to laws, regulations, standards, and internal policies regarding data use. Organizations must comply with regulations like the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the California Consumer Privacy Act (CCPA) and SOC2 standards to protect sensitive information and maintain trust. Data compliance plays […]

Data Engineering Machine Learning

What is Data Compliance? Tools, Benefits & Key Metrics

Tal Sofer

Organizations deal with ever-increasing volumes of data. More data translates into more risk, as hackers have a larger target area. This is where data compliance comes in. It helps mitigate these threats and protect consumer data by setting compliance standards that companies and individuals must adhere to while working with data.  How does data compliance

Data Engineering Machine Learning Product

How We Built Our lakeFS Iceberg Catalog

Itai Gilo

A behind-the-scenes look at the design decisions, architecture, and lessons learned while bringing the Apache Iceberg REST Catalog to lakeFS. When we first announced our native lakeFS Iceberg REST Catalog, we focused on what it means for data teams: seamless, Git-like version control for structured and unstructured data, at any scale. But how did we

Data Engineering Machine Learning

What is Data Virtualization? Benefits, Use Cases & Tools

Tal Sofer

Data integration is a vital first step in developing any AI application. This is where data virtualization comes in to help organizations accelerate application development and deployment. By virtualizing data, teams can unlock its full potential by providing real-time AI insights for applications like predictive maintenance, fraud detection, and demand forecasting. Virtualizing data centralizes and

Data Engineering Machine Learning Product

Git-Like Data Versioning Meets MLOps: lakeFS with MLflow, DataChain, Neptune & Quilt

Iddo Avneri

Modern machine learning pipelines involve a mix of tools for experiment tracking, data preparation, model registry, and more. MLflow, DataChain, Neptune, and Quilt are some MLOps tools serving these needs. However, one critical piece underpins them all: data version control. This is where lakeFS comes in.  lakeFS is not an experiment tracker or ML platform;

Data Engineering Machine Learning

What is Data Discovery, How It Works & Why It Matters

Tal Sofer

Most organizations collect massive amounts of data from various sources, including customer interactions, supply networks, financial systems, and more. As a result, teams may feel overwhelmed by a flood of data while seeking key insights, and the question of data manageability becomes more pressing than ever. This is where data discovery comes in. Data discovery

Data Engineering Machine Learning Thought Leadership

The State of Data and AI Engineering 2025

Einat Orr, PhD

Since 2021, we’ve published the annual State of Data Engineering Report, which includes a summary of all key categories that directly impact data engineering infrastructure. In 2025, we see five primary trends that influence the categories that will be covered in this report. Trend #1: MLOps space is slowly diminishing The MLOps space is slowly

Data Engineering Machine Learning

What Is an AI Factory and How Does It Work?

Tal Sofer

During the 2025 Nvidia GTC conference, one of the keywords that drew a lot of attention was “AI factory.” An AI factory is Nvidia’s idea for producing large-scale AI systems. This concept aligns AI development with the industrial process, in which raw data is received, improved through computation, and converted into valuable products via data-driven

Best Practices Data Engineering Machine Learning

6 Types of Metadata: Examples, Tools & Frameworks

Idan Novogroder

With the volumes of generated data increasing, metadata has become an essential component in organizing and comprehending massive datasets. Metadata plays a key role in any modern data strategy, especially among organizations that treat data as one of their most precious assets. This article dives into all the different metadata types, tools, and frameworks to

Best Practices Data Engineering Machine Learning

Top Data Lineage Tools for 2025 and Their Benefits

Iddo Avneri

Data lineage tools make it easier for teams to track the transfer of data across several systems, databases, and applications. Ultimately, this translates into better capabilities around understanding and handling data.  But how do you choose the best data lineage solution for your organization? This article dives into the most widespread data lineage tools to

Data Engineering Machine Learning

DataOps Best Practices and Top Tools for 2025

Idan Novogroder

DataOps is an approach that aims to enhance collaboration among teams involved in data operations, including data engineers, data scientists, and stakeholders.  The idea is to create a more coherent and efficient data-driven environment by automating time-consuming procedures, reducing errors, and speeding up data transmission. This will give companies better time for insight and the

lakeFS