Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Learn from AI, ML & data leaders from Dell, Lockheed Martin, Red Hat & more

Headless agents are coming for your data. Be ready with lakeFS.

Oz Katz
Last updated on May 6, 2026

Table of Contents

Watch how lakeFS works

The lakeFS Control Plane for AI-ready Data provides agents that rely on large, multimodal datasets, isolated access, verifiable results and built-in governance.

TL;DR

  • Headless agents are entering enterprise data workflows in production. Salesforce coined the term at TrailblazerDX 2026: AI agents that operate software through APIs, MCP tools, and command lines, with no user interface and no human in the loop per action.
  • Agents are becoming the primary consumer of enterprise data. Three things break when that happens: scale becomes unmanageable, errors spread faster than humans can catch them, and governance collapses across scattered logs.
  • lakeFS, the control plane for AI-ready data, sits between data storage and the tools, people, and agents that consume it. Branches, commits, and merges – the primitives that let teams collaborate on data the way developers collaborate on code – turn out to be exactly what autonomous agents need.
  • Three outcomes matter most: isolation (contain the blast radius), reproducibility (replay any agent run), and governance (one audit trail across everything).
  • Repository-level branching is the differentiator. lakeFS branches span structured tables, unstructured files, images, video, and metadata together, across any object store – not one table format, not one catalog.
  • No new agent infrastructure required. Agents read and write lakeFS through standard file operations. No custom MCP server, no SDK, no special integration.

A new kind of consumer for your data

A few weeks ago at TrailblazerDX 2026, Salesforce put a name on something the rest of the industry had been circling for months: Headless agents. AI agents that operate software through APIs, MCP tools, and command lines, with no user interface and no human clicking through every step. The API is the UI.

The label matters less than what it represents: a shift in who is actually using your enterprise data. For the last decade, the answer has been people, with some automation. Going forward, the answer is increasingly agents – autonomous, parallel, fast, and operating at a scale no team of people could match.

That shift breaks assumptions that data infrastructure was never tested against.

Three things change when agents become the primary consumer

According to the EY AI Pulse Survey, 83% of executives say AI initiatives would move faster with stronger data infrastructure. That gap was already the #1 enterprise AI roadblock before headless agents entered the conversation. Agents are widening it, not closing it.

Scale becomes unmanageable 

You are no longer running one pipeline on a dataset. You are running dozens of concurrent agents, each reading and writing to the same shared data. The informal conventions that used to keep people out of each other’s way – directory prefixes, naming schemes, a Slack message asking “are you using this?” – do not survive contact with dozens of parallel autonomous processes.

Errors spread faster 

A human reviews work before it is published. An agent does not. When an agent makes a mistake, that mistake reaches production before anyone notices. By the time a human catches it, the corrupted data has already propagated to downstream models, dashboards, and decisions.

Governance collapses 

Regulators, auditors, and compliance teams will ask what your agents did with your data. “The model decided” is not a regulatory answer. Neither is “the logs are scattered across our orchestrator, our LLM provider, our object store, and our observability tool.” You need a single, complete, structured record of every change an agent made, pinned to the exact data state the agent operated on.

These are not new problems. They are the same key pain points lakeFS was built to address – data access and collaboration bottlenecks, reproducibility challenges, data quality issues, slow recovery from failure, and no audit trail for compliance. The difference is that the volume, velocity, and variance of agent-driven workloads pushes them from painful to untenable.

How lakeFS serves agent workloads

lakeFS is the control plane for AI-ready data. It sits between data storage and the tools, people, and agents that consume it. It does not replace any existing infrastructure. It adds a light layer in between. As a result, every agent gets its own isolated branch covering the full set of multimodal data it needs. Changes are committed, validated, and merged under policy. Agents operate at machine speed without corrupting shared state. Every action is auditable down to the exact data state the agent saw.

That delivers the three things headless agents need from their data infrastructure – isolation, reproducible (verifiable) results, and governance with a full audit trail. Three outcomes matter most for any team running agents on shared data.

Isolation: contain the blast radius

When an agent works on a branch, it is not writing to production. It is writing to its own zero-copy view of your repository. If the agent produces something wrong, the wrong data lives on the branch, not in your production pipelines. Revert the branch and the problem is gone. Recovery that used to take hours or days takes seconds.

Branches in lakeFS span structured tables, unstructured files, images, video, logs, and metadata – the full dataset, even across different source systems. That is important because agents rarely touch just one kind of data. An agent generating synthetic training examples works across raw images, a feature table, and a manifest file, all at once. Isolation only works if every piece of what the agent touches is isolated together.

Reproducibility: replay any agent run

Agents are non-deterministic by design. The same prompt and the same data will not always produce the same result. That is a feature for exploration and a liability for accountability.

lakeFS gives you a commit for every state of your data. Tie the agent run to the commit, and the run becomes reproducible. When something goes sideways – a bad output, a customer escalation, a compliance question – you can pin the commit the agent saw, reconstruct the exact data view, and re-execute.

Governance: one audit trail across everything

Agent actions live on branches. Merges are gated by policy. Every commit carries metadata about what happened, by whom, to which data. The result is a single, queryable record of agent-driven data change, instead of evidence scattered across orchestrators, model providers, and cloud logs.

For regulated industries – financial services, healthcare, insurance, autonomous systems, defense – this is not a nice-to-have. When “what did your agents do?” becomes an audit question, the answer needs to be more than a trace through three disconnected systems.

Why only a repository-level control plane can provide AI-ready data for agents

A lot of infrastructure claims to be “AI-ready” or “agent-ready.” Most of it is solving a narrower problem than the one headless agents actually present.

Table-format branching only covers one format. Catalog-level governance stops at metadata and does not protect the underlying files. Agent runtime layers give agents tools and policies but leave the data beneath them undefended. A plain file-access layer on top of object storage gives agents file I/O and nothing else – no versioning, no isolation, no merge, no audit.

lakeFS branches happen at the repository level, across any object store, spanning structured and unstructured data together as one coherent state. The same commit that versions your Iceberg table also versions the image directory that sits next to it, the JSON manifest that references both, and the metadata you attached to all three. That is the substrate headless agents actually need – and it is what the lakeFS control plane has provided all along.

headless agents are coming for your data

What makes this work for agents specifically

A few characteristics of lakeFS matter especially for agent workloads:

  • Standard file system access. Agents read and write lakeFS the same way they use any file system. No custom MCP server, no SDK install, no special integration. Agents that can read and write files can use lakeFS.
  • Multimodal in one place. Videos, images, audio, structured tables, metadata – all versioned together, all reachable through the same interface.
  • Branch-scoped credentials. An agent’s token is valid only for its own branch. Agents cannot read or write outside their workspace by construction.
  • Attribution in every commit. Agent identity, run ID, and context land in the commit metadata. The audit trail answers “which agent, which run, which prompt” without extra plumbing.
  • Policy-gated merges. Schema checks, quality validations, and human review where required – the gates agents merge through are reviewable and testable like any other piece of infrastructure.
  • Human in the loop where it matters. Agents can request human review and approval before changes merge into production.

The future is agentic. The infrastructure has to be ready.

The shift to headless agents is not coming. It is here. The question every data and AI leader is now asking is whether the infrastructure under their agents is ready for the way agents actually work – in parallel, at speed, without human review, across multimodal data.

Isolation, reproducibility, and governance are not ideals to bolt on for agents. They are what the control plane for AI-ready data has always delivered. By the time you need a branch for every agent, a full audit trail, and instant recovery, building those in is too late.

Book a demo or request a free trial to learn how the lakeFS Control Plane for AI-ready Data gives agents that rely on large, multimodal datasets isolated access, verifiable results, and built-in governance.


Glossary

Term
Definition

Headless agent

An AI agent that operates software through APIs, MCP tools, or command lines, without a user interface or human-driven step-by-step control.

Control plane for AI-ready data

A layer that sits between data storage and the tools, people, and agents that consume the data, providing governance, reproducibility, and reduced access friction.

Repository-level branching

Branching that spans an entire data repository (structured tables, unstructured files, metadata) as one coherent state, rather than branching one table or one catalog at a time.

Zero-copy branch

An isolated environment created without duplicating data. Branches in lakeFS are zero-copy by design.

Multimodal data

Data that spans modalities: structured tables, semi-structured logs and JSON, columnar formats like Parquet, open table formats like Iceberg, and unstructured images, audio, video, and metadata.

Frequently Asked Questions

Plain object storage gives agents file I/O – read, write, delete. That’s it. There is no isolation between concurrent agents, no atomic merges, no record of who changed what, and no way to recover from a bad agent run except by hand. The control plane adds branches, commits, merges, and audit on top of your existing storage, without moving or copying your data.

No. Agents use lakeFS through standard file operations, the same way they use any file system. There is no custom MCP server to deploy, no SDK to install, and no special integration to maintain. Agents that can read and write files can use lakeFS.

Iceberg native branching covers one table at a time and depends on the compute engine. Catalog-level governance protects metadata but does not version the underlying files. lakeFS branches at the repository level, across any object store, spanning structured tables, unstructured files, and metadata together as one coherent state.

No. lakeFS complements existing infrastructure. It adds a control plane layer between your storage and the tools that consume it, without replacing any of them. Data stays in place, under your control, with no copying or duplication.

Multimodal data of all kinds. Structured tables, semi-structured logs and JSON, columnar formats like Parquet, open table formats like Iceberg (REST catalog), Delta, and Hudi, and unstructured images, audio, video, and metadata. All of it can live on the same branch and be versioned together.

In your object storage, where it already is. lakeFS works on AWS S3, Azure Blob, Google Cloud Storage, any S3-compatible object store, and POSIX storage. Data stays in place, under your control, with no copying or duplication. Deployment options include public and private cloud, on-premises, government clouds (AWS GovCloud, Azure Government), and air-gapped environments.

Every change made by an agent is captured as a commit with structured attribution metadata – agent identity, run ID, and context. Merges into production are gated by declarative policy. The result is a single, queryable record of every agent-driven data change, with full lineage and reproducibility built in.

Through branch-scoped credentials. Each agent’s token is valid only for its own branch, so an agent cannot read or write outside its workspace by construction. lakeFS supports SSO, SCIM, AWS IAM roles and short-lived tokens (STS), role-based access control, and AWS PrivateLink for network-level isolation.

Check out lakeFS.io, book a demo or request a free trial to learn more about the lakeFS Control Plane for AI-ready Data and how it provides agents that rely on large, multimodal datasets with isolated access, verifiable results, and built-in governance.

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy