Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Best Practices

Best Practices Data Engineering Machine Learning

lakeFS Top 10 Defining Product Milestones in 2025

Oz Katz

2025 was a defining year for lakeFS. Across open source and Enterprise editions, we shipped major capabilities that expanded lakeFS from a powerful data versioning layer into a control plane for AI-Ready Data – spanning structured and unstructured data, multiple public and private clouds, and a growing ecosystem of analytics and ML engines. Here’s our

Best Practices Product

How CytoReason Streamlined Nextflow with lakeFS for Smarter Data Pipelines

Ron Poches

TL;DR CytoReason is a technology company transforming biopharma’s decision-making—from trial and error to data-driven—through its AI platform of computational disease models. Leveraging an extensive database of public and proprietary data, the company maps human diseases tissue by tissue and cell by cell. Researchers at leading pharma companies, including Pfizer and Sanofi, rely on CytoReason’s technology

Best Practices Data Engineering Machine Learning

Building a Data Center of Excellence for Modern Data Teams

Einat Orr, PhD

Sooner or later, every data team will reach a point where things stop working – whether it’s due to team growth, changing business requirements, or advancing pipeline complexity. When facing these issues, leaders start considering a different approach that perfectly balances centralized and decentralized organizational models. A Data Center of Excellence (DCoE) is a centralized

Best Practices Machine Learning

Iceberg Tables Management: Processes, Challenges & Best Practices

Itai Gilo

We all love data lakes. They’re just perfect for storing massive volumes of structured, semi-structured, and unstructured data in native file formats. And they let us explore, refine, and analyze petabytes of data constantly pouring in from various sources. But there’s a caveat. The individual files in a data lake lack the necessary information for

Best Practices Product Thought Leadership

Git-Style Workflows for Multimodal AI Data Using Dremio and lakeFS

Alex Merced, Tal Sofer

This post recaps a comprehensive tutorial published by Alex Merced from Dremio and Tal Sofer from lakeFS, highlighting how version control transforms multimodal data management for AI teams. The Challenge: Keeping Diverse Data Types in Sync and Queriable Modern AI pipelines consume more than just structured data. Training sets include images, model artifacts, logs, and

Best Practices Product Tutorials

Adding Data Version Control Capabilities to MATLAB with lakeFS

Joe Pringle

Many lakeFS customers in the aerospace, automotive, healthcare & life sciences, and manufacturing industries also are heavy users of MATLAB. lakeFS solves a range of data ops challenges for these organizations by serving as a “control plane” for AI-ready data – versioning complex data pipelines, tracking metadata and lineage, and enabling team collaboration through git-like

Best Practices Data Engineering Machine Learning

How lakeFS Transactional Mirroring Keeps Your Data Available During Cloud Outages

Idan Novogroder

When AWS Goes Down, Your Data Shouldn’t On October 20th, 2025, AWS experienced a significant outage centered in the us-east-1 region. What started as a DNS resolution issue affecting DynamoDB quickly cascaded into widespread failures across major services and applications. From gaming platforms like Fortnite and social apps like Snapchat to enterprise systems and IoT

Best Practices Data Engineering Machine Learning

Bound by Physics: Why Data Version Control is Critical for Real-World AI

Vince Antinozzi, Yoav Yetinson

TL;DR Software-only systems can be rerun from the source, but physics-bound workflows face a tougher challenge. Once a moment is gone, it’s gone. Sensor drift, hardware changes, and environmental uniqueness make it impossible to recreate the exact conditions. For audits, safety, and machine learning, you need full data provenance, including raw data, derived outputs, and

Best Practices Data Engineering Machine Learning

Versioning Data Labels: Integrating Labeling Tools with lakeFS

Iddo Avneri

In this post, we explore how lakeFS can integrate with popular data labeling solutions, the differences between labeling tools’ built-in dataset management and lakeFS data version control, and why combining them is invaluable. We’ll also highlight use cases – from autonomous vehicles to healthcare – where rigorous data versioning alongside labeling is essential. Overview of

Best Practices Product Tutorials

Versioned Data with Apache Iceberg Using lakeFS Iceberg REST Catalog

Amit Kesarwani

lakeFS Enterprise offers a fully standards-compliant implementation of the Apache Iceberg REST Catalog, enabling Git-style version control for structured data at scale. This integration allows teams to use Iceberg-compatible tools like Spark, Trino, and PyIceberg without any vendor lock-in or proprietary formats. By treating Iceberg tables as versioned entities within lakeFS repositories and branches, users

lakeFS