Ready to dive into the lake?

lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

lakeFS Blog

Data Architecture

OLTP: Guide to Enterprise Data Architecture Part 1

The lakeFS team

Data is a goldmine for every organization, no matter the industry. But to make the most of it, businesses need technology to maintain and manage transactional data like payments, inventory updates, and customer records. This is where OLTP databases come in. Online Transaction Processing (OLTP) databases are used to store and process large numbers of …

OLTP: Guide to Enterprise Data Architecture Part 1 Read More »

Data Engineering Machine Learning Tutorials

Backfilling Data: A Foolproof Guide to Managing Historical Data

Iddo Avneri

If you work with a smaller dataset or do one-off jobs, the way you manage backfills isn’t that crucial. But what if you face constantly growing datasets with billions to trillions of records? Your backfilling data strategy will have a much bigger impact. When dealing with modern data pipelines on such a scale, it’s key …

Backfilling Data: A Foolproof Guide to Managing Historical Data Read More »

Announcements

New in lakeFS Version 0.100.0

The lakeFS team

lakeFS is celebrating its version release 0.100.0! We’re excited to announce the 0.100.0 version release! It’s thanks to you, the community that we’ve reached this milestone and: Managed to run 4,197 commits From 73 contributors, including (to name a few): Let’s celebrate this 0.100.0 release in a BIG way.  Install the 100th lakeFS v0.100.0 and …

New in lakeFS Version 0.100.0 Read More »

Best Practices Tutorials

Version Control Data Pipelines Using the Medallion Architecture

Iddo Avneri

A step by step guide to running pipelines on Bronze, Silver and Gold layers with lakeFS Introduction The Medallion Architecture is a software design pattern that organizes a data pipeline into three distinct tiers based on functionality: bronze, silver, and gold. The bronze tier represents the core functionality of the system, while the silver and …

Version Control Data Pipelines Using the Medallion Architecture Read More »

Best Practices Data Engineering

Applying Engineering Best Practices to Data Lakes

Einat Orr, PhD.

In the last 30 years, agile development methodology played a significant part in the digital transformation the world is undergoing. What stands as the basis of the methodology is the ability to iterate fast on product features, using the shortest possible feedback loop from ideation to user feedback. This short feedback loop allows us to …

Applying Engineering Best Practices to Data Lakes Read More »

Best Practices Machine Learning Tutorials

Building an ML Experimentation Platform for Easy Reproducibility Using lakeFS

Vino SD

MLOps is mostly data engineering. As organizations ride past the hype cycle of MLOps, we realize there is significant overlap between MLOps and data engineering. As ML engineers, we spend most of our time collecting, verifying, pre-processing, and engineering features from data before we can even begin training models.  Only 5% of developing and deploying …

Building an ML Experimentation Platform for Easy Reproducibility Using lakeFS Read More »

Integrations

Here’s Something Diff-erent: lakeFS adds support for diff of Delta tables

Robin Moffatt

Delta Lake is one of the three new open-source table formats gaining wide adoption in the data engineering community, along with Apache Iceberg and Apache Hudi. In the recent release of lakeFS we’ve added support for comparing the state of Delta Lake tables so that you can see metadata for what has changed. Delta Diff …

Here’s Something Diff-erent: lakeFS adds support for diff of Delta tables Read More »

Git for Data – lakeFS

  • Get Started
    Get Started
  • The annual State of Data Engineering Report is now available. Find out what’s new in 2023 -

    +