Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Scalable Data Version Control

Manage your data as code using Git-like operations and achieve reproducible, high-quality data pipelines. Available Open Source or on the Cloud.

Take control of your data

COMPUTE ENGINES

lakeFS supports all standard computation engines.

lakefs

lakeFS uses metadata to manage data versions. Its versioning engine is highly scalable with minor impact to storage performance

formats

lakeFS is format agnostic, regardless of format type be it structured, unstructured, open table, or anything else.

Object Storage

lakeFS supports data in all object stores including all major cloud providers S3, Azure Blob, GCP, and on prem MinIO,  Ceph, Dell EMC and any other S3 compatible storage.

Use Cases

lakeFS helps data engineers and data scientists in every field manage their data like code — at scale

Robust Data Pre-Processing

Data cleaning, outlier handling, filling in missing values, etc. Ensure your data pipelines for pre-processing are robust and provide high quality.

Deduplicated Experimentation

Use lakeFS branches to run experiments in parallel with zero-copy clones in a fully deduplicated data lake, allowing you to effectively compare them to select the best one.

Reproducible Feature Engineering & Model Training

Commit the results of your experiments and use the lakeFS Git integration to reproduce any experiment with the right version of the data, the code and the model weights.

Create branch

Isolated Dev/Test Environments

Create isolated dev/test environments using lakeFS branches and reduce your testing time by 80%.

Promote Only High Quality Data to Production

Implement CI/CD for data with lakeFS hooks, allowing for automation of quality validation checks.

Fix Bad Data with Production Rollback

Save entire consistent snapshots of your data using commits, allowing you to rollback to previous commits in case of bad data.

lakeFS is already helping thousands of developers

UP TO 80%

Reduce storage costs

2X

Double efficiency

UP TO 99%

Increase production
outage recovery

Trusted by

Seamless integration with
all your data stack

Object Storage
Compute Engines
Ingest Technologies
Data Storage Formats
Orchestration & Workflow
Research and ML
Data Quality

All common ingest technologies are integrated into lakeFS

lakeFS is format agnostic! Regardless of the format you’re using, lakeFS will support it

Manage Orchestration and Workflows better with popular orchestration tools supported on lakeFS

Data Quality is mandatory for your data lake health. Ensure/maintain the highest data quality together with lakeFS

lakeFS Data Version Control Blog

Need help getting started?

Git for Data – lakeFS

  • Get Started
    Get Started