Yesterday, OpenAI launched gpt-oss-120b and gpt-oss-20b, marking the company’s first open-weight models since GPT-2 in 2019. This strategic shift represents far more than a product release—it signals a fundamental transformation in how large organizations, particularly in regulated industries, approach AI infrastructure and data management.
OpenAI’s Strategic Return to Open Source
The gpt-oss models—gpt-oss-120b and gpt-oss-20b—are built for flexibility, efficiency, and high performance both in the cloud and at the edge, offering two fully downloadable, customizable large language models (LLMs) that don’t require a license fee or API gate. Both are mixture-of-experts (MoEs) and use a 4-bit quantization scheme (MXFP4), enabling the large model to fit on a single H100 GPU while the small one runs within 16GB of memory, perfect for consumer hardware and on-device applications.
The timing isn’t coincidental. OpenAI’s move comes in direct response to competitive pressure from Chinese AI companies like DeepSeek R1, which have been offering near-parity performance benchmarks to paid proprietary models for free. CEO Sam Altman acknowledged this shift, stating that “Going back to when we started in 2015, OpenAI’s mission is to ensure AGI that benefits all of humanity. To that end, we are excited for the world to be building on an open AI stack created in the United States, based on democratic values, available for free to all and for wide benefit.”
Regulated Industries Drive On-Premises AI Adoption
This open-source release couldn’t come at a more critical time for enterprise adoption. A recent Capgemini report found that 73% of organizations want AI systems to be explainable and accountable to support responsible use, with sovereignty demands meaning architectural flexibility: running models on-premises, in a hybrid cloud or in a sovereign cloud environment when needed.
Regulated industries face unique challenges that make on-premises deployment essential:
Data Sovereignty and Compliance: Regulations such as HIPAA, GDPR, and ITAR require data to remain within national borders or even specific facilities. Financial institutions adopting AI must contend with model risk management, compliance challenges, and evolving regulatory oversight, with research from the Financial Stability Board indicating that many banks face structural hurdles in integrating AI while maintaining robust risk controls.
Operational Requirements: In finance, agentic AI agents have improved trade execution efficiency and regulatory compliance in leading Wall Street firms, while in healthcare, the integration of agentic AI into hospital workflows is accelerating diagnoses, reducing errors, and improving patient outcomes. These applications require low-latency, high-reliability infrastructure that on-premises solutions can best provide.
Security and Control: On-premises AI platforms offer a unique combination of security, performance, and control that cloud-native environments can’t fully replicate, with every part of the AI lifecycle happening behind the company’s firewall.
The Enterprise AI Tech Stack Evolution
As organizations deploy these open-source models internally, they’re discovering that enterprises can use a powerful, near topline OpenAI LLM on their hardware totally privately and securely, without sending data to the cloud. However, this shift creates new infrastructure demands.
The enterprise AI technology stack now requires:
- Scalable compute infrastructure with GPU optimization
- Robust data management systems for multimodal content
- Version control and reproducibility tools for datasets and models
- Security and compliance frameworks built for AI workloads
- Integration capabilities with existing enterprise systems
Industries such as healthcare, autonomous vehicles, finance, manufacturing, and SaaS leverage AI infrastructure for innovation and operational efficiency, but challenges include high costs of hardware/software integration, data privacy concerns, skill shortages, and managing large-scale data processing needs.
Data Management: The Foundation of Enterprise AI
While powerful models like gpt-oss represent significant technological advancement, they’re only as valuable as the data that powers them. According to the most recent EY Survey on AI Adoption, the vast majority (83%) of surveyed executives said “AI adoption would be faster if they had a stronger data infrastructure, and 67% say they could move faster on AI adoption, but the lack of data infrastructure is holding them back”.
Modern enterprise AI initiatives face unique data challenges:
Managing multimodal data: Handling structured, semi-structured, and unstructured data across varied sources and formats.
Ensuring reproducibility: Being able to reliably recreate training datasets, experiments, and model outputs.
Maintaining data quality: Preventing data errors and enforcing consistent standards throughout the pipeline.
Meeting compliance requirements: Demonstrating traceability, access control, and auditability for regulatory approval.
lakeFS: Git-Like Version Control for Enterprise AI Data
Just as Git revolutionized software development, lakeFS is reshaping enterprise AI by versioning the data that powers it. Designed for massive volumes of unstructured, semi-structured and structured data in data lakes—text, images, audio, video—lakeFS gives organizations control, safety, and reproducibility at scale.
Organizations like Arm, Bosch, Lockheed Martin, NASA, Volvo, and the U.S. Department of Energy are using lakeFS as part of their data management infrastructure. Lockheed Martin uses lakeFS to help manage data as part of its AI factory, where any user dealing with data creates a lakeFS repository, putting all the data relevant for their research or their model, enabling the team within that repository to collaborate very easily by working on branches and merging good results, being able to reproduce any point in time within the development of the model.
lakeFS lets teams manage their data using Git-like operations (branch, commit, merge, etc.) while scaling to billions of files and petabytes of data. Multiple team members can work on the same data concurrently, each creating a separate branch for their experiments, with the ability to tag data to represent specific experimental states, ensuring accurate and reproducible results.
The Path Forward: Integration and Best Practices
The convergence of OpenAI’s open-source release and enterprise infrastructure requirements creates unprecedented opportunities, but also reveals a critical gap in the enterprise AI workflow. With GPT-OSS, developers can now build private agents that reason and call tools, deploy on-prem or at the edge with full control, contribute to alignment research with real models, and skip API costs for prototyping and testing.
However, deploying open-source models on-premises is just the first step. Unlike proprietary cloud services, open-source models don’t benefit from the same scale of pre-training on massive, often proprietary datasets, making enterprise fine-tuning essential for high-quality, domain-specific performance. To unlock their potential, organizations must fine-tune these models using their own proprietary data: domain-specific documents, industry knowledge bases, and specialized workflows that differentiate their business.
This fine-tuning process creates an entirely new set of data infrastructure demands. Organizations need to manage massive datasets for training, track multiple model versions as they iterate, ensure reproducibility for compliance and auditing, and enable data scientists to collaborate safely on shared datasets without costly duplication. This is where robust data version control becomes not just helpful, but essential—and why solutions like lakeFS have become foundational to enterprise AI success.
Success in this new landscape requires:
- Unified Infrastructure Planning: Note that the AI infrastructure needs to comply with relevant industry regulations, such as HIPAA in healthcare or AML in finance
- Data-First Architecture: Teams should version both data and metadata for full reproducibility, use branching to safely test changes and new models, make data easily discoverable with rich metadata, and track data and pipeline changes end-to-end
- Enterprise-Grade Tooling: Solutions that can handle the scale and complexity of modern enterprise data while maintaining Git-like simplicity
Looking Ahead
OpenAI’s gpt-oss release marks a pivotal moment where open-source AI models meet enterprise infrastructure reality. “As AI data becomes larger, messier and more mission-critical, lakeFS delivers the control layer needed to build, iterate and ship with confidence. Built for the scale and complexity of modern enterprises, lakeFS is not just a smart solution, it’s a foundational layer for reproducibility, collaboration and trust in the AI era” according to Ido Hart, Partner at Maor Investments.
Organizations that recognize this moment and invest in robust, self-managed AI infrastructure—anchored by proper data management and version control—will be best positioned to capitalize on the AI revolution while meeting their regulatory, security, and operational requirements.
The question isn’t whether enterprises will adopt powerful open-source AI models, but whether they’ll have the data infrastructure foundation to make them truly valuable. In this new landscape, the best LLM is indeed useless without proper data management, and the winners will be those who solve both sides of this equation.
Learn more about how lakeFS can help your organization build AI-ready data infrastructure.


