Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Learn from AI, ML & data leaders from Dell, Lockheed Martin, Red Hat & more

Product

Product

The lakeFS playground is now live and everybody can play!

Oz Katz, Michal Wosk

What if you could manage your data lake just like you manage code? With rollback, versioning, and branching capabilities on top of your existing data lake? lakeFS is an open-source project that provides a Git-like version control interface for data lakes, with seamless integration to most data tools and frameworks. lakeFS enables you to easily […]

Data Engineering Product

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino

Paul Singman, Guy Hardonag

logo 10 Everything Bagel is now lakeFS-samples You can find Everything Bagel, along with lots of amazing hands-on examples for using lakeFS, in the lakeFS-samples repository. Let's Go! Introduction: Dockerize Your Data Pipeline I can remember times when my company started using a new technology — be it Redis, Kafka, or Spark — and in order to try it

Product Tutorials

dbt Tests – Create Staging Environments for Flawless Data Write-Audit-Publish

Guy Hardonag, Paul Singman

Recently, we’ve heard from several community members experimenting with new development workflows using lakeFS and dbt. The timing isn’t surprising given dbt’s more recent support of big data compute tools like Spark and Trino that are some of the most commonly-used technologies by lakeFS users managing a data lake over an object store. The combination

Product

Building an Auto-Upgrading lakeFS Environment

Nirav Adunuthula

One look at the cheerful aquamarine axolotl resting atop the lakeFS homepage was all it took to assure me that an internship here would make this summer different than any other… This summer I had the opportunity to intern with the amazing developer team at Treeverse and work on the lakeFS project. This opportunity represents an important

Product

Seamlessly Sync Data Into Your lakeFS Repos With Airbyte

Itai Admi

New features in Airbyte and lakeFS make it easy to send data replicated by Airbyte into a lakeFS repo. See how to leverage this integration in your data pipelines! If you work in data, chances are you rely on replicating data between different systems to centralize it for analysis. Modern companies produce data from all

Product

New in lakeFS: Data Retention Policies

Yoni Augarten, Guy Hardonag

“I can remember everything. That’s my curse, young man. It’s the greatest curse that’s ever been inflicted on the human race: memory.” — Jedediah Leland, Citizen Kane (1941) lakeFS makes data corruptions easy to avoid and fix by allowing you to travel back in time to any state of your data. This new capability has

Machine Learning Product

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman

Introducing Kubeflow and lakeFS Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a

Product

Advancing lakeFS: Version Data At Scale With Spark

Tal Sofer

Combining lakeFS and Spark provides a new standard for scale and elasticity to distributed data pipelines. When integrating two technologies, the aim should be to expose the strengths of each as much as possible. With this philosophy in mind, we are excited to announce the release of the lakeFS FileSystem! This native Hadoop FileSystem implementation

Product

Air & Water: The Airflow and lakeFS Integration

Itai Admi

Today we are excited to announce the official release of the lakeFS Airflow provider! What this package does is allow you to easily integrate lakeFS functionality to your Airflow DAGs. The library is published on PyPI so it can easily be installed in your project via the command: pip install airflow-provider-lakefs Once installed, you are

Product Tutorials

Power Amazon EMR Applications with Git-like Operations Using lakeFS

Itai Admi

This article will provide a detailed explanation of how to use lakeFS with Amazon EMR. Today, it’s common to manage a data lake using cloud object stores like AWS S3, Azure Blob Storage, or Google Cloud Storage as the underlying storage service. Each cloud provider offers a set of managed services to simplify the way

Product

Building Reproducible Data Pipelines with Airflow and lakeFS

Guy Hardonag

Update (May 26th, 2021): We officially released the lakeFS Airflow provider. Read all about it in the latest blog post. In this post, we’ll see how easy it is to use lakeFS with an existing Airflow DAG, to make every step in a pipeline completely reproducible in both code and data. This is done without

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy