While AI’s promise of increased efficiency and innovation is clear, executing on it is proving to be a major challenge, especially at scale.
At lakeFS, we’ve been at the center of this challenge since the beginning. And today, we are excited to announce a $20 million growth funding round, which helps us to double down on innovation and focus on the features that matter most to our customers and enterprise-scale AI.
Our community has always been the source of insight into pain points, challenges, and emerging use cases of data version control. And as AI projects started to take center stage within many organizations, we saw an increasing number of people asking questions in our community, pointing to a growing need in AI projects to deploy lakeFS as a foundational layer for data and AI operations.
Let’s take a closer look at the data infrastructure challenges AI teams face today and why lakeFS is well-positioned to solve them.
Where does the AI data infrastructure gap come from?
Organizations are in a high-stakes race to unlock value and gain a competitive edge through AI solutions. But while ambitions are sky-high, the reality on the ground tells a different story.
According to EY’s most recent survey on AI adoption, 83% of executives said they would move faster if they had stronger data infrastructure. Two-thirds admitted that the lack of it is directly holding them back.
This is what we mean when we talk about the AI data infrastructure gap: the growing disconnect between the rising importance of data and the ability to manage it.
This is becoming a major roadblock for AI, MLOps, and data teams building out their AI infrastructures today. Instead of focusing on innovation, they’re often stuck dealing with incorrect, out-of-date, and non-reproducible data.
This results in time-consuming manual management of petabyte-scale datasets, piecing together sub-datasets for individual experiments, or trying (and often failing) to reproduce the exact data used to train a model. The process is slow, error-prone, and potentially risky.
Without robust data infrastructure, AI’s promise can become a tangle of operational challenges, such as delayed launches, ballooning costs, and underwhelming outcomes.
How does lakeFS fit into the AI lifecycle?
Just as Git transformed the way software is developed, lakeFS is doing the same for enterprise AI by bringing version control to data.
Built to handle massive volumes of structured, semi-structured, and unstructured data, lakeFS gives you the control, safety, and reproducibility they need to scale AI with confidence.
With lakeFS, data, AI, and ML teams can:
- Experiment faster on massive datasets without the cost or complexity of duplicating storage
- Reproduce models and pipelines on demand for compliance, auditing, or debugging
- Collaborate at scale with full visibility and control over every change to data, models, and environments
lakeFS impacts data preparation and the research or training phases, which are the most time-consuming and expensive. Our data versioning capabilities speed up the process by addressing the pain of preparing data for AI/ML in terms of data quality and management. This translates into a much faster time to market, allowing teams to deploy quality models in production and actually let organizations use them.
Moreover, lakeFS’s ability to support any data format is crucial for the complex multimodal data environments that teams deal with in generative AI projects.
A data versioning infrastructure, across environments and tools
Enterprise-scale AI reaches across individual environments and tools requiring an infrastructure-based approach to data management.. Such a layered approach ensures all users in an organization can access data versions across all data types with full lineage and context. In contrast, a tool-specific capability would silo versioning, preventing data teams from translating this benefit to a broader organizational context.
This approach, encapsulated in lakeFS, lets teams broaden the impact of data versioning and maintain golden datasets for AI training, making it cheaper and easier to unlock more data for AI use and accelerate technology adoption.
By integrating lakeFS into their stack, teams are reducing time to market for AI initiatives while improving data and model quality. In a competitive landscape where speed and precision are crucial, this advantage can determine whether an organization leads the market or lags behind.
This new investment supports the rapidly growing demand for lakeFS, fueling the expansion of its engineering and go-to-market teams, accelerating product development, and deepening global enterprise partnerships. The funding builds on a period of strong momentum for lakeFS, including Fortune 100 customer wins, triple-digit community growth and recent product release such as distributed data management, lakeFS Mount and Iceberg REST catalog support.

