Integrations

Integrations Tutorials

Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial

Iddo Avneri
January 11, 2023

Introduction This tutorial will review all steps needed to configure lakeFS on Databricks.  This tutorial assumes that lakeFS is already set up and running against your storage (in this example AWS s3), and is focused on setting up the Databricks and lakeFS integration. Prerequisites Step 1 – Acquire lakeFS Key and Secret In this step, …

Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial Read More »

Announcements Integrations

lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI

Oz Katz
February 5, 2023

We’re happy to introduce experimental support for running SQL queries, directly on objects (your files!) in lakeFS. All through the lakeFS UI. No need to install or configure anything. TLDR – You can now run SQL queries on Parquet and other tabular formats, directly from the lakeFS UI! Explore data, look at its schema, compare …

lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI Read More »

Integrations Use Cases

Troubleshoot and Reproduce Data with Apache Airflow

Iddo Avneri
December 6, 2022

Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge …

Troubleshoot and Reproduce Data with Apache Airflow Read More »

Data Engineering Integrations

One Spark job, Many Data Sources – How to Easily Use lakeFS with Spark

Jonathan Rosenberg, Tal Sofer
August 15, 2022

lakeFS is an interface to the data lake, or the parts of the data lake one chooses to version control. The lakeFS interface is S3 compatible, and hence easily used with all common data applications, including Spark. In some cases, lakeFS is first adopted by the teams responsible for the data ingested to the lake, …

One Spark job, Many Data Sources – How to Easily Use lakeFS with Spark Read More »

Data Engineering Integrations

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino

Paul Singman, Guy Hardonag
September 15, 2022

Introduction: Dockerize Your Data Pipeline I can remember times when my company started using a new technology — be it Redis, Kafka, or Spark — and in order to try it out I found myself staring at a screen like this: At the time I thought nothing of doing this. And even wore it as a badge of pride …

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino Read More »

Integrations

dbt Tests – Create Staging Environments for Flawless Data CI/CD

Guy Hardonag, Paul Singman
May 11, 2022

Recently, we’ve heard from several community members experimenting with new development workflows using lakeFS and dbt. The timing isn’t surprising given dbt’s more recent support of big data compute tools like Spark and Trino that are some of the most commonly-used technologies by lakeFS users managing a data lake over an object store. The combination …

dbt Tests – Create Staging Environments for Flawless Data CI/CD Read More »

Integrations

Seamlessly Sync Data Into Your lakeFS Repos With Airbyte

Itai Admi
May 17, 2022

New features in Airbyte and lakeFS make it easy to send data replicated by Airbyte into a lakeFS repo. See how to leverage this integration in your data pipelines! If you work in data, chances are you rely on replicating data between different systems to centralize it for analysis. Modern companies produce data from all …

Seamlessly Sync Data Into Your lakeFS Repos With Airbyte Read More »

Integrations Machine Learning

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman
November 21, 2022

Introducing Kubeflow and lakeFS Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a …

Build Reproducible Experiments with Kubeflow and lakeFS Read More »

Git for Data – lakeFS

  • Get Started
    Get Started
  • LIVE: Develop Spark pipelines against production data on February 15 -

    Register Now
    +