Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Tutorials

Data Engineering Machine Learning Product Tutorials

Getting started with lakeFS Cloud

Iddo Avneri

A step by step guide to the lakeFS Cloud playground environment In this document, you will learn the quickest way to get started with lakeFS, utilizing the playground experience in lakeFS Cloud. Then I will cover how to connect your own storage to lakeFS, so you can run lakeFS against your own data.  Step 1: […]

Tutorials

Schema Validation with lakeFS: A Step-by-Step Guide

Iddo Avneri

Introduction Schema validation ensures that the data stored in the lake conforms to a predefined schema, which specifies the structure, format, and constraints of the data. It’s important for: Data lakes provide a lot of flexibility compared to a more rigid data model that is available in a data warehouse; there are typically a variety

Best Practices Tutorials

How to Migrate or Clone a lakeFS Repository: Step-by-Step Tutorial

Amit Kesarwani

Introduction If you want to migrate or clone repositories from a source lakeFS environment to a target lakeFS environment then follow this tutorial. Your source and target lakeFS environments can be running locally or in the cloud. You can also follow this tutorial if you want to migrate/clone a source repository to a target repository

Best Practices Tutorials

Version Control Data Pipelines Using the Medallion Architecture

Iddo Avneri

A step by step guide to running pipelines on Bronze, Silver and Gold layers with lakeFS Introduction The Medallion Architecture is a software design pattern that organizes a data pipeline into three distinct tiers based on functionality: bronze, silver, and gold. The bronze tier represents the core functionality of the system, while the silver and

Best Practices Machine Learning Tutorials

Building an ML Experimentation Platform for Easy Reproducibility Using lakeFS

Vino SD

MLOps is mostly data engineering. As organizations ride past the hype cycle of MLOps, we realize there is significant overlap between MLOps and data engineering. As ML engineers, we spend most of our time collecting, verifying, pre-processing, and engineering features from data before we can even begin training models.  Only 5% of developing and deploying

Data Engineering Tutorials

9 Best Practices For Handling Late-Arriving Data

Adi Polak

Processing what you’d call the “latest” data may sound simple, but in reality, it’s complex and challenging. When you gather time-based data, you’ll quickly notice that some are born late, some achieve lateness, and others are forced to become late.  How does that happen? Here are a few good reasons why late-arriving data are so

Tutorials

Authorization (RBAC) in lakeFS: Step-by-Step Configuration Tutorial

Amit Kesarwani

Introduction Last month, the lakeFS team decided to move from the decoupled security authentication and access control features to enable you to plug your own authentication and security mechanism. Consequently, the team decided to change the architecture to a pluggable one which enables you to choose your preference without being dependent on the lakeFS solution.

Tutorials

The Airflow and lakeFS Integration: Step-by-Step Configuration Tutorial

Amit Kesarwani

Introduction lakeFS makes creating isolated environments for data ingestion instantaneous so you can run data ingestion jobs without impacting your production data and merge ingested data atomically to your production data instantaneously. This frees you from spending time on environment maintenance and makes it possible to create as many environments as needed. If ingested data

Product Tutorials

Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial

Iddo Avneri

Introduction This tutorial will review all steps needed to configure lakeFS on Databricks.  This tutorial assumes that lakeFS is already set up and running against your storage (in this example AWS s3), and is focused on setting up the Databricks and lakeFS integration. Prerequisites Step 1 – Acquire lakeFS Key and Secret In this step,

lakeFS