lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

Learn from AI, ML & data leaders

March 31, 2026 | Live

The data version control system of choice for Databricks users

lakeFS for Databricks enables rapid data delivery and makes data management possible across your entire Databricks environments.

lakeFS, an official Databricks Technology Partner,  provides a scalable data version control system for all your data use cases

Databricks Unity Catalog

Version control your catalog for seamless dataset management from dev to production

Delta Lake Support

Version across multiple tables and unlock multi-table time travel

Databricks Workflows

Embed lakeFS in workflows and make versioning an integral part of your data pipeline

MLflow Experiments

Track data and model versions in MLflow and ensure safe, reproducible ML experiments

Databricks Notebooks

Access lakeFS from within your Databricks Notebooks and keep your data and notebook versions synced

Support for Databricks Compute Options

Run lakeFS across all Databricks compute options

Use Git-like operations to gain control over your data

Data Engineering ML & AI Analytics

Increase your data engineering velocity

Isolated Pipeline Development

Develop data pipelines in  isolation without interfering with production data

Write-Audit-Publish Data with Databricks Jobs

Manage data flows and ensure data quality before your pipelines are deployed to production

Multi Delta-Table Transactions

Record changes made to multiple Delta tables as part of a logical pipeline step and leverage multi-table time travel

Versioned Medallion Architecture

Utilize distinct repositories  for Bronze, Silver, Gold layers  and commit metadata to track data changes lineage

Deliver ML projects to production faster

Data Preparation  in Isolation

Track all preprocessing changes and ensure only valid data reaches production

Parallel ML Experimentation

Run multiple experiments simultaneously, using different dataset versions, without duplicating data

Machine Learning Data Reproducibility

Maintain consistent datasets while adjusting model parameters and track them in the Databricks ML Experiments view

Fast Data Loading for  Deep Learning Workloads

Localize data to reduce latency and cut costs by optimizing GPU utilization

Advanced Unstructured Data Filtering

Simplify model development by filtering objects using custom tags

Increase data quality with engineering best practices

Test Before Deploying

Run quality checks on your data before going to production

Effortless Team Collaboration

Easily share and edit specific data versions without stepping on each other’s toes

Fast Error Recovery

Quickly rollback to stability after data errors, ensuring smooth operations with minimal downtime

Reliable Data Audit Trails

Maintain a comprehensive log of all production data changes, including who made them and why

Keep Your Production Data Safe

Develop and test ETL changes on production data without actually modifying or copying data

Guaranteed performance, flexibility and quality you can control

Featured real-world use cases

Learn from industry leaders and watch how lakeFS for Databricks works

Project Alexandria: A Digital Library

Dev/Stage/Prod is the Wrong Pattern for Data Pipelines

Dataset Versioning in the Age of Open Table Formats