Ready to dive into the lake?

lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Increase data quality and reduce the painful cost of errors

Data engineering best practices
using git-like operations on data

lakeFS is an open source data version control for data lakes.

It enables zero copy Dev / Test isolated environments, continuous quality validation, atomic rollback on bad data, reproducibility, and more.

Trusted by

Big Data engineering requires data version control

Our data is transient and dealing with it is an inefficient and manual task. With lakeFS, your data lake is versioned and you can easily time-travel between consistent snapshots of the lake.

EASIER ETL TESTING

Test your ETL on top of production data, in isolation

Safely experiment, test and collaborate with your team on full production data without consuming extra storage costs.

CI/CD FOR DATA

Promote only high quality data to production

Automate data quality checks and ensure that only the validated data is pushed to production while bad data is kept out.

REPRODUCIBLE EXPERIMENTS

Re-run experiments, regardless of their version

Time travel with your data and move back in time to any state of your experiments as they were during development, allowing for easy reproduction of past experiments

Data Version Control that works seamlessly with today’s data stack

lakeFS is fully compatible with a wide ecosystem of data engineering tools and technologies

Works seamlessly with today’s data stack

lakeFS is fully compatible with a wide ecosystem of data engineering tools and technologies

installations-icon-2.svg

3000+

Installations

githubstars-icon.svg

2.2K

GitHub Stars

community-icon.svg

1800+

Community members

Trusted by

karius-logo-.svg
similarweb-1.svg
windward-.svg
int-02
int-4.svg
int-1.svg
int-3.svg
int-6.svg
Integrations_5-1-1.svg
int-7.svg
int-5.svg
int-8.svg
int-9.svg
int-14.svg
int-10.svg
int-11.svg
Group-526-1.svg
int-12.svg
int-16.svg
int-13.svg
int-15.svg
int-18.svg
int-19.svg
int-17.svg

Manage your data like code
with data version control

Your data stays in place while lakeFS provides highly scalable, format agnostic and zero copy data version control over it

20%-80%

Storage Cost Reduction

X2

Double Data Engineering Efficiency

2 Seconds

Average time to rollback bad data

Seamless integration with
all your data stack

Object Storage
Compute Engines
Ingest Technologies
Data Storage Formats
Orchestration & Workflow
Research and ML
Data Quality

All common ingest technologies are integrated into lakeFS

lakeFS is format agnostic! Regardless of the format you’re using, lakeFS will support it

Manage Orchestration and Workflows better with popular orchestration tools supported on lakeFS

Data Quality is mandatory for your data lake health. Ensure/maintain the highest data quality together with lakeFS

Stay updated

Talk to a lakeFS engineer

Git for Data – lakeFS

  • Get Started
    Get Started
  • The annual State of Data Engineering Report is now available. Find out what’s new in 2023 -

    +