Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros
Oz Katz
Oz Katz Author

Oz Katz is the CTO and Co-founder of lakeFS, an...

Published on February 25, 2026

The AI agent revolution is here. Coding agents like Claude Code, Cursor, and Codex are writing production software. Infrastructure agents are provisioning cloud resources. Data agents are transforming pipelines and updating dashboards. The trajectory is clear: agents will do more, not less.

But there’s a pattern hiding in plain sight, one that explains why coding agents are so far ahead of everything else, and what needs to happen for agents to be trusted with the rest of your organization’s critical assets.

The Three Pillars of Agentic Work

Every AI agent, regardless of domain, needs three things to do useful work:

Context: 

What should I be doing? This is the task definition, the requirements, the background knowledge that shapes the agent’s decisions. For a coding agent, it’s the issue description, the codebase architecture, the style guide. For a data agent, it might be a business rule, a schema definition, or a quality threshold.

Tools: 

How do I modify things? These are the interfaces the agent uses to interact with the world. Text editors, command-line utilities, APIs, package managers. The mechanisms of action.

State: 

The things that will be modified. This is the actual artifact the agent is working on. The code. The data. The configuration. The thing that matters.

Two of these three are improving rapidly, across every domain. Context keeps getting better: context windows are growing, agentic memory frameworks are maturing, and vector databases give agents vast retrieval capabilities. Tools are proliferating through standards like the Model Context Protocol (MCP), sandboxed execution environments, and skill frameworks that give agents structured ways to interact with complex systems. These improvements benefit coding agents, data agents, infrastructure agents, and everything in between.

But the third pillar, state management, is a different story. It’s a solved problem for code. It remains largely unsolved for everything else. And that gap is the single biggest factor determining where agents can be trusted today and where they can’t.

3 pillars of agentic work

Why Coding Agents Work (State, Solved)

Coding agents aren’t trusted because they’re smarter than other agents. They’re trusted because the state management problem was solved for code, decades before AI entered the picture. It’s called version control.

Think about what happens when a coding agent goes to work. It checks out a branch. It reads files, edits code, runs tests, installs dependencies. It modifies the state freely, with confidence, because that state is isolated. Nothing it does affects production until a human says so.

When the agent is done, the human reviews a diff. They can run the code before accepting it. They can run the test suite. They can deploy to staging. They can even decide to review the generated code line by line, change by change: they can see exactly what was modified and why.  And only when they’re satisfied do they merge.

This is the “trust, but verify” loop, and it’s powered by version control:

  • Isolation. The agent modifies state on a branch, never touching the main line
  • Verification. Diffs, code review, and CI/CD pipelines provide full transparency into what changed
  • Acceptance. Merging is an explicit, atomic act of approval

Version control didn’t just enable collaboration between humans. It created the trust infrastructure that makes human-agent collaboration possible. The agent can be bold. The human can be cautious. And the system mediates between them cleanly.

trust but verify
Claude Code, hard at work refactoring a React component

The Missing Layer: State

Now ask yourself: would you give an AI agent the same level of access to your organization’s data?

Would you let it rewrite a production metrics table? 

Repartition a dataset that feeds your revenue dashboards? 

Restructure a feature store that powers your ML models? Directly? In production? With no safety net?

Of course not.

And yet, that’s the state of the art for most data-oriented agent workflows today. The other two pillars are already there: context is covered, with agents that understand schemas, read documentation, and have full access to the semantic layer. Tools are covered, with SQL interfaces, Spark connectors, DBT integrations, and MCP servers for every data platform imaginable. Two out of three pillars are in place.

But the third? State management for data? It’s still “deploy and pray”.

There’s no branch. There’s no diff. There’s no pull request. There’s no atomic merge. There’s no way to let an agent work freely and maintain the ability to verify before accepting. The trust infrastructure simply doesn’t exist for data the way it does for code.

This is the bottleneck. Not model intelligence. Not tool availability. Not context length. The bottleneck is the absence of a version control layer for the actual state that agents modify.

Enter lakeFS: The Data Access Layer for AI Agents

This is exactly why lakeFS exists, and why it matters more now than ever.

lakeFS bridges the infrastructure gap that slows enterprise AI initiatives by acting as a control plane for AI-ready data. In an agent-driven world, AI-ready data isn’t just clean or accessible – it’s data that can be safely changed. Built on a scalable data version control architecture, lakeFS introduces a governed state layer between your agents (or any compute engine) and your object storage, giving organizations control over how data is modified and promoted.

 It doesn’t copy data, doesn’t require migration, and works with the tools and formats you already use: From structured data formats such as Parquet, Iceberg and Delta Lake, to semi- or even unstructured data such as images, videos and PDF documents.

For AI agents operating on data, lakeFS provides the exact same trust infrastructure that Git provides for code:

Isolation through branching 

An agent creates a branch and operates on a fully consistent, isolated snapshot of the data. It can write, delete, repartition, and transform freely. Nothing is visible to production or other consumers until explicitly merged. Multiple agents can work in parallel on separate branches without interfering with each other.

Verification through diffing and review 

When the agent’s work is complete, lakeFS shows exactly what changed: which objects were added, modified, or deleted. Pre-merge hooks can trigger automated validation: schema checks, data quality tests, row count comparisons, custom business rules. A human (or another agent) can review the change before it goes live.

Acceptance through atomic merge 

Merging in lakeFS is an atomic operation. The data is either fully applied or not at all. No partial states, no inconsistencies, no half-applied transformations visible to downstream consumers. This is the “commit and push” moment for data.

This is the “trust, but verify” loop, applied to data at scale.

What This Unlocks

When agents have proper state management, the entire calculus of what you’re willing to automate changes.

An agent can compact and optimize your Iceberg tables on a branch, validate that query performance improved and no data was lost, and merge, all without risking a single production query. An agent can ingest and integrate a new data source, apply transformations, run quality checks, and present the result as a pull request for the data team to review. An agent can experiment with different partitioning strategies, benchmark each one in isolation, and promote the winner.

The pattern is always the same: let the agent work freely in isolation, verify the result, and accept atomically.

This isn’t just about safety (although safety alone would justify it). It’s about velocity. The reason coding agents ship so much code is not just because they’re fast at writing it. It’s because the infrastructure around them makes it cheap to try, cheap to verify, and cheap to revert. The same dynamic applies to data. When reverting a bad transformation is a single operation instead of an incident, you let agents take bigger swings. When verification is built into the workflow, you don’t need to babysit every step.

The Future Is Agentic, If We Build the Right Infrastructure

We’re at an inflection point. AI agent capabilities are improving on a curve that shows no sign of flattening. Context will keep getting richer. Tools will keep getting more capable. Two of the three pillars will take care of themselves.

The third one, state management, is the one we have to build. For code, it already exists. For data, it’s lakeFS.

Trust your agents. But give yourself the means to verify.

lakeFS

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy