Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Learn from AI, ML & data leaders

March 31, 2026  |  Live

Best Practices

Best Practices Data Engineering Machine Learning

How lakeFS Transactional Mirroring Keeps Your Data Available During Cloud Outages

Idan Novogroder

When AWS Goes Down, Your Data Shouldn’t On October 20th, 2025, AWS experienced a significant outage centered in the us-east-1 region. What started as a DNS resolution issue affecting DynamoDB quickly cascaded into widespread failures across major services and applications. From gaming platforms like Fortnite and social apps like Snapchat to enterprise systems and IoT

Best Practices Data Engineering Machine Learning

Bound by Physics: Why Data Version Control is Critical for Real-World AI

Vince Antinozzi, Yoav Yetinson

TL;DR Software-only systems can be rerun from the source, but physics-bound workflows face a tougher challenge. Once a moment is gone, it’s gone. Sensor drift, hardware changes, and environmental uniqueness make it impossible to recreate the exact conditions. For audits, safety, and machine learning, you need full data provenance, including raw data, derived outputs, and

Best Practices Data Engineering Machine Learning

Versioning Data Labels: Integrating Labeling Tools with lakeFS

Iddo Avneri

In this post, we explore how lakeFS can integrate with popular data labeling solutions, the differences between labeling tools’ built-in dataset management and lakeFS data version control, and why combining them is invaluable. We’ll also highlight use cases – from autonomous vehicles to healthcare – where rigorous data versioning alongside labeling is essential. Overview of

Best Practices Product Tutorials

Versioned Data with Apache Iceberg Using lakeFS Iceberg REST Catalog

Amit Kesarwani

lakeFS Enterprise offers a fully standards-compliant implementation of the Apache Iceberg REST Catalog, enabling Git-style version control for structured data at scale. This integration allows teams to use Iceberg-compatible tools like Spark, Trino, and PyIceberg without any vendor lock-in or proprietary formats. By treating Iceberg tables as versioned entities within lakeFS repositories and branches, users

Best Practices Machine Learning Thought Leadership

OpenAI’s Open Source Revolution: Why Enterprise AI Infrastructure Matters More Than Ever

Gottfried Sehringer

Yesterday, OpenAI launched gpt-oss-120b and gpt-oss-20b, marking the company’s first open-weight models since GPT-2 in 2019. This strategic shift represents far more than a product release—it signals a fundamental transformation in how large organizations, particularly in regulated industries, approach AI infrastructure and data management. OpenAI’s Strategic Return to Open Source The gpt-oss models—gpt-oss-120b and gpt-oss-20b—are

Best Practices Product Thought Leadership

The Evolving Equation: When Do You Move From Open Source to Enterprise with Data Version Control

Tal Sofer

Open source software has fundamentally reshaped technology—delivering unmatched flexibility, low friction, and rapid innovation. For some teams, it’s a philosophical commitment. For others, it’s the fastest path to building. lakeFS supports both models. For most data teams, the journey starts with open source and evolves over time. lakeFS open source offers a robust foundation for

Best Practices Machine Learning

AI-Ready Data: Characteristics, Challenges & Best Practices

Tal Sofer

Despite the increasing adoption of Artificial Intelligence (AI) applications, most organizations are bound to see implementation challenges. One of the issues lies in the data itself. A recent survey showed 80% of companies believe their data is suitable for AI, but more than half are actually dealing with challenges like internal data quality and categorization

Best Practices Machine Learning Product Tutorials

A Single Pane of Glass to Your Data: Multiple Storage Backends Support in lakeFS

Tal Sofer

Today’s organizations don’t just use a single data storage solution – they operate across on-prem servers, multiple cloud providers, and hybrid environments. This distributed approach has become necessary, but it comes with significant costs: teams struggle with siloed tools, duplicated processes, and an endless cycle of environment management that diverts focus from delivering actual value. 

Best Practices Data Engineering Machine Learning

6 Types of Metadata: Examples, Tools & Frameworks

Idan Novogroder

With the volumes of generated data increasing, metadata has become an essential component in organizing and comprehending massive datasets. Metadata plays a key role in any modern data strategy, especially among organizations that treat data as one of their most precious assets. This article dives into all the different metadata types, tools, and frameworks to

Best Practices Machine Learning

What is AI Data Storage? Benefits, Challenges & Best Practices

Tal Sofer

Many companies are modernizing their data storage infrastructure to capitalize on the opportunities of machine learning (ML) and advanced analytics. However, teams face several unique data management challenges such as the increasing time required for AI training and inference workloads, as well as the cost and scarcity and resources, particularly GPUs. Storage is a key

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy