Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Tutorials

Best Practices Product Tutorials

Adding Data Version Control Capabilities to MATLAB with lakeFS

Joe Pringle

Many lakeFS customers in the aerospace, automotive, healthcare & life sciences, and manufacturing industries also are heavy users of MATLAB. lakeFS solves a range of data ops challenges for these organizations by serving as a “control plane” for AI-ready data – versioning complex data pipelines, tracking metadata and lineage, and enabling team collaboration through git-like […]

Best Practices Product Tutorials

Versioned Data with Apache Iceberg Using lakeFS Iceberg REST Catalog

Amit Kesarwani

lakeFS Enterprise offers a fully standards-compliant implementation of the Apache Iceberg REST Catalog, enabling Git-style version control for structured data at scale. This integration allows teams to use Iceberg-compatible tools like Spark, Trino, and PyIceberg without any vendor lock-in or proprietary formats. By treating Iceberg tables as versioned entities within lakeFS repositories and branches, users

Best Practices Machine Learning Product Tutorials

A Single Pane of Glass to Your Data: Multiple Storage Backends Support in lakeFS

Tal Sofer

Today’s organizations don’t just use a single data storage solution – they operate across on-prem servers, multiple cloud providers, and hybrid environments. This distributed approach has become necessary, but it comes with significant costs: teams struggle with siloed tools, duplicated processes, and an endless cycle of environment management that diverts focus from delivering actual value. 

Best Practices Product Tutorials

How to Avoid Data Breaches by using RBAC 

Amit Kesarwani

Introduction Role-Based Access Control (RBAC) is an effective way to minimize the risk of data breaches by ensuring users only have access to the data and systems necessary for their job roles. Here’s how you can use RBAC to avoid data breaches: 1. Principle of Least Privilege (PoLP) 2. Define Clear Roles and Responsibilities 3.

Product Tutorials

How To Get Started with lakeFS Enterprise: Step-by-Step Tutorial

Amit Kesarwani

What is lakeFS Enterprise? lakeFS Enterprise is a commercially-supported version of lakeFS, offering additional features and functionalities that meet the needs of organizations from a production-grade system. Why did we build lakeFS Enterprise? lakeFS Enterprise was built for organizations that require the support, security standards and features required of a production-grade system and are not

Best Practices Product Tutorials

Collaborating Over Data: Introducing Pull Requests in lakeFS

Oz Katz, Itai Gilo

In modern software development, Pull Requests (PRs) are a fundamental tool for collaborating on code. They allow teams to review, discuss, and merge changes in a controlled and transparent way.  But what if you could apply that same concept to data? At lakeFS, we’re excited to introduce Pull Requests for data — a new feature

Best Practices Product Tutorials

Guide To The lakeFS File Representation

Iddo Avneri

Once you start using lakeFS, the files on your object store will form a new representation. The names and paths of the files on the object store will no longer look the same.  This article provides a high-level overview of the lakeFS file representation to help you understand the lakeFS file representation and how it

Best Practices Product Tutorials

Metadata Enforcement: Step-by-Step Tutorial

Amit Kesarwani

Metadata enforcement is a broad term that can refer to different aspects of managing and controlling metadata. Let’s explore few key areas: Understanding Metadata Enforcement 1 – Data Privacy and Protection: 2 – Data Governance and Quality: 3 – Legal and Compliance Challenges in Metadata Enforcement Strategies for Effective Metadata Enforcement We will focus on

Best Practices Tutorials

Delta Time Travel in Databricks: How It Works

Tal Sofer

Databricks Delta Lake includes a number of time travel features to let you access any previous version of the extensive data that Delta automatically versions and stores in your data lake. This makes it simple to audit, roll back data in the event of unintentional poor writes or deletes, and reproduce reports and trials. How

Best Practices Tutorials

What Is Write-Audit-Publish and Why Should You Care?

Einat Orr, PhD

The Write-Audit-Publish (WAP) pattern in data engineering gives teams greater control over data quality. But what does it entail, and how do you implement it? Keep reading to learn more about the Write-Audit-Publish pattern, examine its use cases, and get a practical implementation example. What is Write-Audit-Publish all about? Write-Audit-Publish (WAP) aims to boost trust

Best Practices Tutorials

Power Up Your Lakehouse with Git Semantics and Delta Lake

Oz Katz

The lakehouse architecture has become the backbone of modern big data operations, but it comes with specific issues.  The challenge of data versioning arises in various DataOps areas, including:  Fortunately, open-source tools can help overcome these issues.  In this article, we’ll demonstrate how by implementing Git-like semantics, Delta Lake and lakeFS can work together to

lakeFS