lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data
The data version control system of choice for Databricks users
lakeFS for Databricks enables rapid data delivery and makes data management possible across your entire Databricks environments.
lakeFS, an official Databricks Technology Partner, provides a scalable data version control system for all your data use cases
Databricks Unity Catalog
Version control your catalog for seamless dataset management from dev to production
Delta Lake Support
Version across multiple tables and unlock multi-table time travel
Databricks Workflows
Embed lakeFS in workflows and make versioning an integral part of your data pipeline
MLflow Experiments
Track data and model versions in MLflow and ensure safe, reproducible ML experiments
Databricks Notebooks
Access lakeFS from within your Databricks Notebooks and keep your data and notebook versions synced
Support for Databricks Compute Options
Run lakeFS across all Databricks compute options
Use Git-like operations to gain control over your data
Data Engineering
ML & AI
Analytics
Increase your data engineering velocity
Isolated Pipeline Development
Develop data pipelines in isolation without interfering with production data
Write-Audit-Publish Data with Databricks Jobs
Manage data flows and ensure data quality before your pipelines are deployed to production
Multi Delta-Table Transactions
Record changes made to multiple Delta tables as part of a logical pipeline step and leverage multi-table time travel
Versioned Medallion Architecture
Utilize distinct repositories for Bronze, Silver, Gold layers and commit metadata to track data changes lineage
Deliver ML projects to production faster
Data Preparation in Isolation
Track all preprocessing changes and ensure only valid data reaches production
Parallel ML Experimentation
Run multiple experiments simultaneously, using different dataset versions, without duplicating data
Machine Learning Data Reproducibility
Maintain consistent datasets while adjusting model parameters and track them in the Databricks ML Experiments view
Fast Data Loading for Deep Learning Workloads
Localize data to reduce latency and cut costs by optimizing GPU utilization
Advanced Unstructured Data Filtering
Simplify model development by filtering objects using custom tags
Increase data quality with engineering best practices
Test Before Deploying
Run quality checks on your data before going to production
Effortless Team Collaboration
Easily share and edit specific data versions without stepping on each other’s toes
Fast Error Recovery
Quickly rollback to stability after data errors, ensuring smooth operations with minimal downtime
Reliable Data Audit Trails
Maintain a comprehensive log of all production data changes, including who made them and why
Keep Your Production Data Safe
Develop and test ETL changes on production data without actually modifying or copying data
Guaranteed performance, flexibility and quality you can control
Support All Data Formats
Work with any of your data formats: plain text, open table format, images, videos, you name it
Scalable and Performant
lakeFS supports billions of objects with negligible influence on critical path storage operations
Data Stays in Place
No need to lift and shift - lakeFS manages your data wherever you store it: Cloud or On-Premises
Remote Compute
Integrate seamlessly with compute engines. Use Databricks, Trino, Spark or any other compute engine you choose
Quality Outcomes Guaranteed
Run quality checks on pipeline results before promoting the data to production
Featured real-world use cases
Learn from industry leaders and watch how lakeFS for Databricks works