Increase data quality and reduce the painful cost of errors
Transform your data lake into a Git-like repository
The lakeFS open source project for data lakes allows data versioning, rollback, debugging, testing in isolation, and more – all in one.
1M+
Downloads
1.5M+
Daily API calls per installation
2.4K
GitHub Stars
2800+
Community members
3000+
Installations
2.2K
GitHub Stars
1800+
Community members
Trusted by
Data versioning at scale
Our data is transient and dealing with it is an inefficient and manual task. With lakeFS,
your data lake is versioned and you can easily time-travel between consistent snapshots of the lake.
- DEVELOPMENT
- DEPLOYMENT
- PRODUCTION
Develop on top of production
data, in isolation
Deploy data with confidence
Effective troubleshooting and
reverting in production
Works seamlessly with today’s data stack
Manage your data
like code
Your data stays in place while lakeFS provides highly scalable, format agnostic and zero copy git-like operations over it
Get Git-like operations for your data, with lakeFS
- Branches
Experiment
Instantly get a “copy” of your company’s data to debug or experiment
Test
Create an isolated snapshot of the data to debug issues
Collaborate
Work with your team on an isolated version of the data lake that you can all easily refer to




- Merges & Hooks
Best Practices & Data Quality
Expose changes to consumers after quality has been assured with pre-merge hooks
Version control
Create discoverable history of the data lake with an ordered set of versions, and ensure clear communication on which versions are used where
- Commits & Reverts
Rollback
Recover from errors by instantly reverting data to a former, consistent snapshot of the data lake
Troubleshoot
Investigate production errors by starting reproducing the state of the data at the time of failure


Stay updated
Towards Effective DataOps
Gain the confidence to mess with your datawithout making a mess of your data.“If it hurts, do it more often.” is a...
Clearing the mess – How to ensure data quality with versioning
The last decade saw an unprecedented rise in the number of organizations that base their decisions and operations on data. The number...
5 Painful mistakes data engineers make, and how to avoid them
In today’s world of data engineering, we need to store more than just simple text information in relational or non-relational databases, tables...