Ready to dive into the lake?

lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Use Cases

Use Cases

Data Lake Governance at Scale with lakeFS

Iddo Avneri
March 6, 2023

No time for the full article now? Read the abbreviated version here Introduction Often, data lake platforms lack simple ways to enforce data governance. This is especially challenging since data governance requirements are complicated to begin with, even without the added complexities of managing data in a data lake. Therefore, enforcing them is an expensive, …

Data Lake Governance at Scale with lakeFS Read More »

Integrations Use Cases

Troubleshoot and Reproduce Data with Apache Airflow

Iddo Avneri
December 6, 2022

Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge …

Troubleshoot and Reproduce Data with Apache Airflow Read More »

Tutorials Use Cases

How to Build an Isolated Testing Environment for Data with lakeFS

Barak Amar
March 14, 2023

Overview Our routine work with data includes developing code, choosing and upgrading compute infrastructure, and testing new and changed data pipelines. Usually, this requires running our tested pipelines in parallel to production, in order to test the changes we wish to apply. Every data engineer knows that this convoluted process requires copying data, manually updating …

How to Build an Isolated Testing Environment for Data with lakeFS Read More »

Data Engineering Use Cases

How to Develop Spark ETL Pipelines in Isolation

Amit Kesarwani, Vino SD, Iddo Avneri
November 7, 2022

You’re bound to ask yourself this question at some point: Do I need to test the Spark ETLs I’m developing? The answer is yes; you certainly should – and not just with unit testing but also integration, performance, load, and regression testing. Naturally, the scale and complexity  of your data set matters a lot, so …

How to Develop Spark ETL Pipelines in Isolation Read More »

Git for Data – lakeFS

  • Get Started
    Get Started