Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Product

Product

Advancing lakeFS: Version Data At Scale With Spark

Tal Sofer

Combining lakeFS and Spark provides a new standard for scale and elasticity to distributed data pipelines. When integrating two technologies, the aim should be to expose the strengths of each as much as possible. With this philosophy in mind, we are excited to announce the release of the lakeFS FileSystem! This native Hadoop FileSystem implementation […]

Product

Air & Water: The Airflow and lakeFS Integration

Itai Admi

Today we are excited to announce the official release of the lakeFS Airflow provider! What this package does is allow you to easily integrate lakeFS functionality to your Airflow DAGs. The library is published on PyPI so it can easily be installed in your project via the command: pip install airflow-provider-lakefs Once installed, you are

Product Tutorials

Power Amazon EMR Applications with Git-like Operations Using lakeFS

Itai Admi

This article will provide a detailed explanation of how to use lakeFS with Amazon EMR. Today, it’s common to manage a data lake using cloud object stores like AWS S3, Azure Blob Storage, or Google Cloud Storage as the underlying storage service. Each cloud provider offers a set of managed services to simplify the way

Product

Building Reproducible Data Pipelines with Airflow and lakeFS

Guy Hardonag

Update (May 26th, 2021): We officially released the lakeFS Airflow provider. Read all about it in the latest blog post. In this post, we’ll see how easy it is to use lakeFS with an existing Airflow DAG, to make every step in a pipeline completely reproducible in both code and data. This is done without

Product

Git-like Operations Over MinIO with lakeFS

Yoni Augarten

lakeFS is an open source tool that delivers resilience and manageability to object-storage based data lakes. lakeFS provides Git-like operations over your MinIO storage environment and works seamlessly with all modern data frameworks such as Spark, Hive, Presto, Kafka, R and Native Python etc. Common use-cases include creating a development environment without copying or mocking

Product

The Quick Guide for Running Presto Locally on S3

Guy Hardonag

This post aims to cover our experience running Presto in a local environment with the ability to query Amazon S3 and other S3 Compatible Systems. We will: TL;DR: If you just want to use the environment you could skip to the example. Context As part of developing lakeFS we needed to ensure that it’s API

lakeFS