Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Webinar

CI/CD Safety Net for Databricks ETLs

The lakeFS Team

Automated Testing in Isolated Environments with GitHub Actions and lakeFS Back by popular demand, we’re hosting a live community training session where we’ll demonstrate how to build a bulletproof ETL development process.   Deploying new ETL code to production can be nerve-wracking Will it handle real world data volumes and complexities? In this live webinar, […]

CI/CD Safety Net for Databricks ETLs

The lakeFS Team

Automated Testing in Isolated Environments with GitHub Actions and lakeFS In this community training session where we’ll demonstrate how to build a bulletproof ETL development process.   Deploying new ETL code to production can be nerve-wracking Will it handle real world data volumes and complexities? In this live webinar, we will show you how to

Streamline Spark Development with lakeFS’s Enhanced Python Client

The lakeFS Team

Delivering high-quality data products requires strict testing of pipelines before deploying them into production.   Today, to test ETLs, you either need to use a subset of the data, or you are  forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative — testing against your

Advancing Data Governance: RBAC for Data Lakes

The lakeFS Team

Join us in this 2-part series These sessions will break down everything you need to know about data governance for data lakes. We will showcase how to use lakeFS to quickly secure your version controlled data lakes using RBAC functionality of lakeFS. Role-Based Access Control (RBAC), also known as role-based security, is a mechanism that

lakeFS & Airflow Integration: What’s New?

The lakeFS Team

Join us for an exciting session where we’ll dive into the latest developments in the lakeFS and Apache Airflow integration. These changes include enhancements that enable seamless bi-directional data management between Airflow and lakeFS.   What we’ll cover: Introduction to lakeFS and Apache Airflow: Gain an understanding of the fundamental concepts and benefits of this integration.

Workshop: Master Data Pipeline Version Control

The lakeFS Team

How to Master the Medallion Architecture with lakeFS Against Your Cloud     Join us for an engaging and insightful workshop as we delve into the Medallion Architecture implementation using lakeFS to enable version control across different data layers (Gold, Silver, Bronze).   In this session we will cover: Introduction to the Medallion Architecture: Gain a

Promote only high quality data to production

The lakeFS Team

Engineering best practices dictate having an isolated staging environment. And yet today, data transformation is done most often directly on production data. Moreover, even if the code and infrastructure doesn’t change, the data might, and those changes introduce potential quality issues. In this webinar, you will learn: How to create a staging environment for your

Version your ML training data for easy reproducibility

The lakeFS Team

Most data science and machine learning workflows are not linear. ML experimentation is an iterative process and you have to go back and forth between different components. Most of us experiment with different data labeling methods, data cleaning and pre-processing techniques and various feature selection methods during model training to arrive at an accurate model.

Create a Dev/Test Environment for Data Pipelines Using Spark and Python

Keren Shohet

Delivering high-quality data requires strict testing of pipelines before deploying them into production. Today, in order to test ETLs, one either needs to use a subset of the data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly and time

Troubleshoot & Reproduce Data with Prefect & lakeFS

Keren Shohet

Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines. lakeFS is a scalable data version control system providing version control over the data lake, using Git-like semantics to create and access those versions.   When using lakeFS with Prefect orchestrated pipelines, you’ll be able to quickly analyze and

Git for Data – lakeFS

  • Get Started
    Get Started
  • Who’s coming to Data+AI Summit? Meet the lakeFS team at Booth #69! Learn more about -

    lakeFS for Databricks
    +