Data Engineering, Product, Thought Leadership

Unity Catalog and the Quiet Return of Vendor Lock-In

Oz Katz

Last updated on June 19, 2026

Home > Blog > Unity Catalog and the Quiet Return of Vendor Lock-In

Learn from AI, ML & data leaders from Dell, Lockheed Martin, Red Hat & more

On Demand WATCH NOW

Databricks built its reputation on openness. Spark. Delta Lake. MLflow.

A company that rose by betting on open ecosystems over proprietary silos.

Which is why Unity Catalog feels like such a sharp turn.

This Week at Databricks’ Data + AI Summit, the Pattern Got Harder to Ignore

At this week’s Data + AI Summit, Databricks expanded what Unity Catalog controls. It now governs models, agents, tools, and MCP servers at runtime.

The announcement’s own framing says it plainly: “In the agentic era, you can’t afford to be locked in.” And yet the entire release is about expanding what lives inside Unity Catalog, with no mention of the open source edition and no investment in working with tables managed by other catalogs. The interoperability story is still one-directional: other engines can access tables Databricks manages, but Databricks won’t fully support tables others manage.

On paper, Unity Catalog is a governance layer: centralized permissions, auditing, discovery, lineage.

In practice, it has become something else entirely: an artificial moat, allowing Databricks to determine what you are allowed to do with your own data.

This isn’t about whether Unity Catalog is good software. Because it is.

This is about how Databricks is using it to re-centralize power, restrict interoperability, and quietly reintroduce vendor lock-in: precisely the thing the modern data stack was supposed to eliminate.

Databricks Unity Catalog meme — We’re not the only ones to notice. One of many memes popping up on the topic

The Core Problem: “You can, as long as we own it”

The key design choice is simple:

Databricks Serverless, SQL, governance, AI features, and recently also Iceberg writes only fully work when the table is managed by Unity Catalog.

If your data is:

In AWS Glue
In an OSS Hive metastore
In a third-party Iceberg catalog
In a custom or open catalog service

— they’ll let you read it, but you often cannot write, mutate, optimize, or use it in many parts of the Databricks offering

That is not a technical inevitability. It is an artificial product boundary: If you want to enjoy the full functionality of the Databricks platform, you have to hand over the keys to your data.

Ironically, even Databricks’ own Unity Catalog Open Source Edition suffers from these limitations.

You can’t use your own Unity Catalog successfully with the breadth of Databricks’ offering.

“Foreign Tables” in Name Only

Unity Catalog supports foreign catalogs. This is the implementation of the original promise of Unity to be a federated catalog. However, looking closely, these are heavily constrained: foreign tables are read-only, offer no DML or DDL capabilities and offer no symmetric interoperability.

Databricks’ answer is always the same: “Just convert it to a Unity Catalog–managed table.”

That’s not federation – that’s absorption.

Iceberg: Open Format, Closed Control

Databricks now loudly champions Apache Iceberg. To the tune of a reported 2 billion dollars.

But that’s a good thing! Iceberg is clearly becoming the de facto standard, widely adopted across the industry. The problem is how Databricks supports it: writes work only when Iceberg tables are managed by Unity Catalog and those tables are accessed via Databricks’ Iceberg REST Catalog endpoint.

Got Iceberg tables managed in a different catalog? We’re back to read-only support and degraded UX.

An open table format without open, interchangeable catalogs is only open in name.

Competitors Don’t Force This Tradeoff

This is where the contrast becomes uncomfortable.

Snowflake

Snowflake, historically the more restrictive, proprietary vendor, does not impose such limitations. They too are standardizing on Iceberg, even to the extent of championing their own open source catalog, now part of the Apache Software Foundation. And Iceberg tables are quickly becoming first-class citizens:

External Iceberg tables are available throughout their offering for both reads and writes.
Iceberg support works with multiple catalogs
Governance does not require metadata capture into Snowflake-owned control planes

Snowflake locks you in through performance and execution, not by withholding functionality. Imagine that: winning by building a better product.

AWS Glue

Glue, being just one of the many different ways of running analytics and ML inside AWS, is open in nature. Use Glue Data Catalog, with Hudi, Iceberg or Delta Lake, or use S3 Tables. Or your self managed catalog. If there’s a dollar to be made on compute, AWS will happily charge it.

Dremio, ClickHouse & Starburst

Unsurprisingly, these are all part of the greater modern data stack. They all have their own means of internally managing data (with Dremio going as far as collaborating with Snowflake on their Polaris Catalog), but currently, all three allow full read and write support for Apache Iceberg tables, managed in external catalogs.

The Role of FUD

Many customers report the same pattern coming from their Databricks account managers: they are being told that future features will only work with Unity Catalog, that they should migrate existing data to Unity, and that their existing choice of catalog might not work long term with Databricks.

None of these statements are outright promises, none are contractual – but they are directional, delivered by sales and solution architects whose incentives are clear: Get their customers fully migrated to Unity Catalog and ensure their retention.

Why This Matters More Than It Seems

This is bigger than Databricks vs Glue or Snowflake.

It’s about the future shape of the open data stack.

The promise of the modern lakehouse was open formats, interchangeable engines and catalogs, and choosing the best tool for the job.

Unity Catalog undermines that by making the catalog itself the choke point. If this strategy is successful, it will also twist the arms of their competitors to react in a similar fashion. This is net negative for the ecosystem and takes us a few decades back in terms of choice, openness and innovation.

Worst of all, if you’re locked in, you have a lot less leverage to negotiate. This drives up price and results in worse products. The literal meaning of enshittification.

The Irony: Databricks Knows Better

This is the most frustrating part.

Databricks is strongly rooted in open source. In fact, it built its brand fighting proprietary warehouses and benefited massively from community trust. Spark was truly a revolutionary project.

And yet, today, it is centralizing control in a proprietary catalog, limiting interoperability by design, and treating openness as an ingestion strategy, not an operating principle.

Unity Catalog is not evil but the way it is being positioned as mandatory infrastructure is deeply at odds with the ecosystem Databricks once championed.

The Fork in the Road

Databricks still has a choice.

Unity Catalog could become a state of the art, open catalog, that customers would be happy to adopt.

Or, if it remains on its current path, it will gradually become more and more of a gatekeeper. A means of returning to the very lock-in the lakehouse was meant to dismantle.

Databricks is an incredible company with an incredible product.

But on this issue, they are currently on the wrong side of history and increasingly, on the wrong side of the open source community that made them possible in the first place.

The open data stack does not need another empire. It needs interoperability without permission.

And it’s not too late for Databricks to remember that.

The Control Plane for AI-Ready Data

Versioned. Reproducible. Compliant.