Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Thought Leadership

Best Practices Product Thought Leadership

Git-Style Workflows for Multimodal AI Data Using Dremio and lakeFS

Alex Merced, Tal Sofer

This post recaps a comprehensive tutorial published by Alex Merced from Dremio and Tal Sofer from lakeFS, highlighting how version control transforms multimodal data management for AI teams. The Challenge: Keeping Diverse Data Types in Sync and Queriable Modern AI pipelines consume more than just structured data. Training sets include images, model artifacts, logs, and […]

Product Thought Leadership

A Celebration of Shared Vision: lakeFS ???? DVC

Einat Orr, PhD

From Inspiration to Action When we were still dreaming up lakeFS, one of the projects that inspired us was DVC (Data Version Control). It was one of those moments when you realize – “Ah, others see it too.” We weren’t alone in believing that data should be managed like code. DVC was built by data

Product Thought Leadership

lakeFS Named a Representative Vendor in the 2025 Gartner® Market Guide for DataOps Tools

Gottfried Sehringer

We’re excited to share that lakeFS has been named a Representative Vendor in the 2025 Gartner® Market Guide for DataOps Tools. We believe this recognition reflects what we’re seeing across the industry: the urgent need for data infrastructures that can provide AI-ready data efficiently, repeatably, and safely as organizations build production AI systems. DataOps Market

Best Practices Machine Learning Thought Leadership

OpenAI’s Open Source Revolution: Why Enterprise AI Infrastructure Matters More Than Ever

Gottfried Sehringer

Yesterday, OpenAI launched gpt-oss-120b and gpt-oss-20b, marking the company’s first open-weight models since GPT-2 in 2019. This strategic shift represents far more than a product release—it signals a fundamental transformation in how large organizations, particularly in regulated industries, approach AI infrastructure and data management. OpenAI’s Strategic Return to Open Source The gpt-oss models—gpt-oss-120b and gpt-oss-20b—are

Best Practices Product Thought Leadership

The Evolving Equation: When Do You Move From Open Source to Enterprise with Data Version Control

Tal Sofer

Open source software has fundamentally reshaped technology—delivering unmatched flexibility, low friction, and rapid innovation. For some teams, it’s a philosophical commitment. For others, it’s the fastest path to building. lakeFS supports both models. For most data teams, the journey starts with open source and evolves over time. lakeFS open source offers a robust foundation for

Data Engineering Machine Learning Thought Leadership

The State of Data and AI Engineering 2025

Einat Orr, PhD

Since 2021, we’ve published the annual State of Data Engineering Report, which includes a summary of all key categories that directly impact data engineering infrastructure. In 2025, we see five primary trends that influence the categories that will be covered in this report. Trend #1: MLOps space is slowly diminishing The MLOps space is slowly

Machine Learning Thought Leadership

Distributed Data Management is Broken – Here’s Why You Should Care

Tal Sofer

In today’s data-driven world, businesses don’t just rely on data – they are built on it. But as data infrastructure sprawls across on-prem systems, multiple cloud providers, and third-party platforms, a new challenge is taking center stage: distributed data management. It’s a silent bottleneck with loud consequences. Challenges in Distributed Data Management  Managing data across

Thought Leadership

The Road Forward Is the Road Back: My Return to Treeverse

Barak Amar

When I first joined lakeFS by Treeverse in 2020, we were just four engineers building an open-source solution for data versioning. It was exhilarating—being part of something from the ground up, shaping the product, and seeing it grow. But after four years, something changed. The excitement faded, and I felt like I was running in

Best Practices Product Thought Leadership

Dataset Versioning in the Age of Open Table Formats

Tal Sofer

Originally presented at Big Data LDN 2024. More than two decades ago, data warehouses outgrew the capacity of single machines, and scaling them started to become costly or inefficient. This prompted the tech industry to rethink the architecture and start to use distributed systems. If we wanted to store more data, we just bought more

lakeFS