Ready to dive into the lake?

lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Integrations

Integrations

Here’s Something Diff-erent: lakeFS adds support for diff of Delta tables

Robin Moffatt

Delta Lake is one of the three new open-source table formats gaining wide adoption in the data engineering community, along with Apache Iceberg and Apache Hudi. In the recent release of lakeFS we’ve added support for comparing the state of Delta Lake tables so that you can see metadata for what has changed. Delta Diff …

Here’s Something Diff-erent: lakeFS adds support for diff of Delta tables Read More »

Integrations Tutorials

Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial

Iddo Avneri

Introduction This tutorial will review all steps needed to configure lakeFS on Databricks.  This tutorial assumes that lakeFS is already set up and running against your storage (in this example AWS s3), and is focused on setting up the Databricks and lakeFS integration. Prerequisites Step 1 – Acquire lakeFS Key and Secret In this step, …

Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial Read More »

Announcements Integrations

lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI

Oz Katz

We’re happy to introduce experimental support for running SQL queries, directly on objects (your files!) in lakeFS. All through the lakeFS UI. No need to install or configure anything. TLDR – You can now run SQL queries on Parquet and other tabular formats, directly from the lakeFS UI! Explore data, look at its schema, compare …

lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI Read More »

Integrations Use Cases

Troubleshoot and Reproduce Data with Apache Airflow

Iddo Avneri

Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge …

Troubleshoot and Reproduce Data with Apache Airflow Read More »

Data Engineering Integrations

One Spark job, Many Data Sources – How to Easily Use lakeFS with Spark

Jonathan Rosenberg, Tal Sofer

lakeFS is an interface to the data lake, or the parts of the data lake one chooses to version control. The lakeFS interface is S3 compatible, and hence easily used with all common data applications, including Spark. In some cases, lakeFS is first adopted by the teams responsible for the data ingested to the lake, …

One Spark job, Many Data Sources – How to Easily Use lakeFS with Spark Read More »

Data Engineering Integrations

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino

Paul Singman, Guy Hardonag

Introduction: Dockerize Your Data Pipeline I can remember times when my company started using a new technology — be it Redis, Kafka, or Spark — and in order to try it out I found myself staring at a screen like this: At the time I thought nothing of doing this. And even wore it as a badge of pride …

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino Read More »

Integrations

dbt Tests – Create Staging Environments for Flawless Data CI/CD

Guy Hardonag, Paul Singman

Recently, we’ve heard from several community members experimenting with new development workflows using lakeFS and dbt. The timing isn’t surprising given dbt’s more recent support of big data compute tools like Spark and Trino that are some of the most commonly-used technologies by lakeFS users managing a data lake over an object store. The combination …

dbt Tests – Create Staging Environments for Flawless Data CI/CD Read More »

Integrations Machine Learning

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman

Introducing Kubeflow and lakeFS Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a …

Build Reproducible Experiments with Kubeflow and lakeFS Read More »

Git for Data – lakeFS

  • Get Started
    Get Started
  • The annual State of Data Engineering Report is now available. Find out what’s new in 2023 -

    +