Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Learn from AI, ML & data leaders

March 31, 2026  |  Live

The data version control system of choice for Databricks users

lakeFS for Databricks enables rapid data delivery and makes data management possible across your entire Databricks environments.

Updated Hero diagram

lakeFS, an official Databricks Technology Partner, 
provides a scalable data version control system for all your data use cases

d-ico1

Databricks Unity Catalog

Version control your catalog for seamless dataset management from dev to production

db-sec2-pat
d-ico2

Delta Lake Support

Version across multiple tables and unlock multi-table time travel

Databricks Workflows

Databricks Workflows

Embed lakeFS in workflows and make versioning an integral part of your data pipeline

d-ico4

MLflow Experiments

Track data and model versions in MLflow and ensure safe, reproducible ML experiments

d-ico5

Databricks Notebooks

Access lakeFS from within your Databricks Notebooks and keep your data and notebook versions synced

d-ico6

Support for Databricks Compute Options

Run lakeFS across all Databricks compute options

Use Git-like operations to gain control
over your data

Data Engineering ML & AI Analytics

Increase your data engineering velocity

Isolated Pipeline Development

Develop data pipelines in 
isolation without interfering with production data

Write-Audit-Publish Data with Databricks Jobs

Manage data flows and ensure data quality before your pipelines are deployed to production

Multi Delta-Table Transactions

Record changes made to multiple Delta tables as part of a logical pipeline step and leverage multi-table time travel

Versioned Medallion Architecture

Utilize distinct repositories 
for Bronze, Silver, Gold layers 
and commit metadata to track data changes lineage

Deliver ML projects to production faster

Data Preparation 
in Isolation

Track all preprocessing changes and ensure only valid data reaches production

Parallel ML Experimentation

Run multiple experiments simultaneously, using different dataset versions, without duplicating data

Machine Learning Data Reproducibility

Maintain consistent datasets while adjusting model parameters and track them in the Databricks ML Experiments view

Fast Data Loading for 
Deep Learning Workloads

Localize data to reduce latency and cut costs by optimizing GPU utilization

Advanced Unstructured Data Filtering

Simplify model development by filtering objects using custom tags

Increase data quality with engineering best practices

Test Before Deploying

Run quality checks on your data before going to production

Effortless Team Collaboration

Easily share and edit specific data versions without stepping on each other’s toes

Fast Error Recovery

Quickly rollback to stability after data errors, ensuring smooth operations with minimal downtime

Reliable Data Audit Trails

Maintain a comprehensive log of
all production data changes, including who made them and why

Keep Your Production Data Safe

Develop and test ETL changes on production data without actually modifying or copying data

Guaranteed performance, flexibility and quality you can control

Support All Data Formats

Work with any of your data formats: plain text, open table format, images, videos, you name it

Support all data formats
Scalable and performant

Scalable and Performant

lakeFS supports billions of objects with negligible influence on critical path storage operations

Data Stays in Place

No need to lift and shift - lakeFS manages your data wherever you store it: Cloud or On-Premises

Remote compute

Remote Compute

Integrate seamlessly with compute engines. Use Databricks, Trino, Spark or any other compute engine you choose

Quality Outcomes Guaranteed

Run quality checks on pipeline results before promoting the data to production

Book a demo and explore lakeFS for Databricks

lakeFS

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy