Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros
Tal Sofer
Tal Sofer Author

Tal Sofer is a product manager at Treeverse, the company...

Published on July 16, 2025

Open source software has fundamentally reshaped technology—delivering unmatched flexibility, low friction, and rapid innovation. For some teams, it’s a philosophical commitment. For others, it’s the fastest path to building.

lakeFS supports both models. For most data teams, the journey starts with open source and evolves over time. lakeFS open source offers a robust foundation for data version control. It enables organizations to move fast, integrate quickly, and build confidence in their data operations—without bureaucratic friction or upfront costs. That accessibility drives adoption and experimentation.

But the very strengths of open source—speed, autonomy, and flexibility—can become liabilities as scale sets in. The lack of enforced controls or operational overhead, while ideal in early phases, can lead to fragile systems later. The true cost of maintaining an open-source solution at enterprise scale often remains unseen until it becomes a pressing reality, making the transition to a more comprehensive solution can feel like a sudden necessity rather than a planned evolution. The payment, in these scenarios, is not for something entirely new, but for addressing the accumulating, previously unquantified costs of operating the open source at scale.

The Unseen Costs of Scale: When “Free” Gets Expensive

For growing organizations, cost isn’t always financial. It often appears as engineering drag—manual processes, longer recovery times, operational risk, and complexity.

Teams running lakeFS open source at scale have already chosen to invest. They’re just paying in different currencies: headcount, time, and focus – which is totally fine but has real consequences. 

This is a familiar progression. Git becomes GitHub Enterprise. Spark becomes Databricks. Kafka becomes Confluent. When tools grow mission-critical, the pressure to stabilize them grows too. Data version control follows the same arc. As scale increases, workarounds emerge—engineered solutions to problems already solved by native enterprise features.

These hidden costs are often unmeasured, but real:

  • Time spent reinventing the wheel
  • Delays in root cause analysis
  • Risky compliance gaps
  • Engineering focus misallocated to infrastructure instead of product

It is a quiet, but constant, sucking sound.

Real-World Friction: Developer and DevOps Pain Points

As organizations scale, initial advantages give way to real pain points—especially for developers and platform teams. These aren’t hypotheticals. We have seen them, and we’ve built a checklist to help identify them. But checklists can feel abstract, so let’s get specific.

Navigating Data Governance & Compliance at Scale

Security and compliance are not just important—they are existential.

lakeFS open source doesn’t offer native SSO, RBAC, or IAM integrations. You can absolutely build workarounds. But at what cost?

Without these controls, every new user or project requires manual provisioning, permission audits, and case-by-case debugging. This drains engineering time and introduces security gaps. Forgotten users, over-provisioned access, and inconsistent permissioning aren’t rare—they are inevitable. Auditing for SOC2 or GDPR compliance becomes a slow, manual ordeal.

Manual governance transforms security from a preventative measure into a resource-heavy scramble. That is not just inefficient—it is risky.

And then there’s audit logging. Many enterprises need a complete, reliable log trail for compliance and incident response. Open source users often find themselves assembling logs across disparate components—or realizing too late that they lack coverage entirely. The second-order effects—data silos, inconsistent access, slow forensic response—are more damaging than the missing logs themselves.

Bridging the Gaps in the Data Stack

lakeFS does not operate in isolation. It sits inside a complex data ecosystem. And when native integrations are missing, you start spending engineering cycles stitching tools together.

Take orchestration: without native support for Airflow or Dagster, teams write custom scripts to handle branching, merging, and committing. These scripts are brittle, hard to maintain, and difficult to debug. The result: longer development cycles, increased maintenance burden, and a fragile workflow that breaks under pressure.

Now layer on table formats like Iceberg or Delta, or multi-cloud storage. Teams often need transactional guarantees, schema evolution support, or hybrid-cloud coordination—all of which require custom engineering in OSS.

There is nothing inherently wrong with this approach. Developers develop. They are skilled in this way. The question becomes – is it a good use of time?

A fragmented data stack, held together by custom scripts, makes it incredibly difficult to achieve true end-to-end data lineage, auditability, and reproducibility—all critical for reliable ML models and regulatory compliance. 

When your data stack is held together by scripts and glue code, reproducibility and auditability suffer. Engineering confidence drops. Business agility slows. And the result is a fragmented system that delivers less value at higher cost.

Operational Complexity & Performance Bottlenecks

As data volumes grow, operational challenges multiply.

Start with garbage collection. In lakeFS, every branch, commit, and change creates new metadata objects. Over time, these pile up. GC cycles stretch. Systems slow. Developer productivity drops as they wait for routine operations to complete. And infra costs climb as teams scale compute just to keep up.

Then there’s the lack of native High Availability (HA) and Disaster Recovery (DR). Without built-in resilience, platform teams must engineer complex failover systems themselves—planning for failure scenarios that enterprise features solve out of the box. The time spent on this work doesn’t drive business value. It’s insurance against fragility.

You know it’s gotten bad when your internal platform team is effectively rebuilding what a commercial solution already offers:

  • Managed GC
  • Automated data retention aligned with business rules
  • Availability guarantees
  • Cross-region failover
  • Security hardening

This is payment—in engineering hours. And that cost compounds.

The Strategic Shift: Investing in Stability, Not Just Scale

Enterprise lakeFS isn’t a commercial upsell, it represents a strategic reallocation of resources—from reactive problem-solving to proactive innovation.

Native SSO, full audit trails, deep integrations with orchestrators and table formats, and managed operational capabilities (like GC, DR, and SLAs) shift the burden away from your internal teams. They eliminate the need for patchwork solutions. And they free your engineers to focus on what differentiates your business.

Add to that faster support response, roadmap visibility, and expert guidance—and suddenly the conversation isn’t about “buy vs. build.” It is about building what matters.

When lakeFS is at the heart of production ML pipelines, analytics platforms, or ETL infrastructure, the cost of downtime or failure grows. The need for confidence and control grows with it. Enterprise lakeFS gives teams the tools and support to move faster, safer, and smarter.

Understanding Your Stage: A Readiness Framework

The decision to move to enterprise isn’t binary. It’s a progression. Most organizations go through stages:

  1. Early Adoption: Small team, manageable data, open source fits perfectly.
  2. Emerging Needs: Usage grows. Inefficiencies appear. Manual processes creep in.
  3. Enterprise Pressure Point: Workarounds abound. Risks grow. Engineering time gets consumed by infrastructure.
  4. High Readiness: You are essentially acting like an enterprise customer—just without the support.

If any of this sounds familiar, you are not alone. Many lakeFS customers followed this path. Their decision to upgrade wasn’t about abandoning open source—it was about building on it.

Summary

lakeFS open source is a powerful, indispensable foundation. It gets teams moving fast with strong principles and low friction. But as scale sets in, the costs shift. Not in licensing fees—but in hidden burdens: engineering hours, operational risk, technical debt, and opportunity cost.

Moving to enterprise isn’t a repudiation of open source. It’s a recognition that your data operations—and your business—have matured. It’s a strategic decision to reduce the total cost of ownership, increase system resilience, and empower your engineers to focus on innovation instead of infrastructure.

This isn’t about selling you something you don’t need. It’s about helping you understand when your organization is ready to stop paying in toil—and start investing in capability.We’re here to help you make that decision. If you’re ready, we’ll guide the way. If you’re not, we’ll wait. Access the checklist here or get in touch directly.

lakeFS