Increase data quality and reduce the painful cost of errors

Data engineering best practices
using git-like operations on data

lakeFS is an open source data version control for data lakes.

It enables zero copy Dev / Test isolated environments, continuous quality validation, atomic rollback on bad data, reproducibility, and more.

Trusted by

Big Data engineering requires data version control

Our data is transient and dealing with it is an inefficient and manual task. With lakeFS, your data lake is versioned and you can easily time-travel between consistent snapshots of the lake.

EASIER ETL TESTING

Test your ETL on top of production data, in isolation

Safely experiment, test and collaborate with your team on full production data without consuming extra storage costs.

CI/CD FOR DATA

Promote only high quality data to production

Automate data quality checks and ensure that only the validated data is pushed to production while bad data is kept out.

REPRODUCIBLE EXPERIMENTS

Re-run experiments, regardless of their version

Time travel with your data and move back in time to any state of your experiments as they were during development, allowing for easy reproduction of past experiments

Data Version Control that works seamlessly with today’s data stack

lakeFS is fully compatible with a wide ecosystem of data engineering tools and technologies

Works seamlessly with today’s data stack

lakeFS is fully compatible with a wide ecosystem of data engineering tools and technologies

installations-icon-2.svg

3000+

Installations

githubstars-icon.svg

2.2K

GitHub Stars

community-icon.svg

1800+

Community members

Trusted by

karius-logo-.svg
similarweb-1.svg
windward-.svg
int-02
int-4.svg
int-1.svg
int-3.svg
int-6.svg
Integrations_5-1-1.svg
int-7.svg
int-5.svg
int-8.svg
int-9.svg
int-14.svg
int-10.svg
int-11.svg
Group-526-1.svg
int-12.svg
int-16.svg
int-13.svg
int-15.svg
int-18.svg
int-19.svg
int-17.svg

Manage your data like code with data version control

Your data stays in place while lakeFS provides highly scalable, format agnostic and zero copy data version control over it

20%-80%

Storage Cost Reduction

X2

Double Data Engineering Efficiency

2 Seconds

Average time to rollback bad data

Stay updated

Talk to a lakeFS engineer

Git for Data – lakeFS

  • Get Started
    Get Started
  • LIVE: Develop Spark pipelines against production data on February 15 -

    Register Now
    +