
How Arm Powers Its Data Management Infrastructure with lakeFS
Arm, a global leader in semiconductor design, manages petabytes of data across internal teams, partners, and global environments.
Facing challenges with data retention costs, governance, and engineering velocity, Arm is leveraging lakeFS for scalable data version control.
With lakeFS, Arm implemented automated data cleaning, avoided costly data duplication, streamlined engineering workflows, and established a robust governance framework to manage data across distributed teams. The result: faster go-to-market, reduced storage costs, improved development velocity, and stronger data governance.
Table of Contents
The company
Arm provides the industry’s most efficient and highest-performing compute platform, with more than 310 billion Arm®-based chips shipped to date. The world’s biggest technology companies, including AWS, Microsoft, Google, Meta, NVIDIA and Samsung, are innovating on Arm, all with a shared purpose of meeting the insatiable compute demand required to deliver the unprecedented capabilities and experiences that AI promises.
The challenges
Arm’s Engineering IT team is responsible for managing petabytes of data generated by internal processes, as well as sensitive datasets shared by customers and partners. As their data needs grew, so did the challenges in scaling a secure, cost-effective, and high-performing data infrastructure
High Cost of Data Retention
The sheer volume of data generated led to ballooning storage costs. Without effective data lifecycle and retention strategies, the accumulation of unnecessary or redundant datasets became an operational and financial burden.
Engineering Velocity
The lack of efficient tooling for dataset branching, tracking, and reproducibility created friction in development workflows. Engineers spent significant time on manual data handling — slowing experimentation, model iteration, and delivery pipelines.
Data Governance and Security
Given the variety of data sources and the globally distributed nature of Arm’s internal teams, security and governance — ensuring the right data is accessible to the right people, at the right time — was important. Arm required versioned, auditable data operations with traceability, reproducibility, and robust lifecycle controls to maintain oversight across teams and projects.
Infrastructure for Global Scale and Growth
As Arm expanded its data-driven initiatives, including AI workloads and cross-team experimentation, it needed to support scalable, collaborative, and high-throughput data operations across global teams.
Adopted solution
Arm selected lakeFS as a pillar of its modern data management stack. lakeFS delivers Git-like version control for data, enabling safe experimentation, reproducibility, and governance at scale.
“The adoption of AI-defined workloads is critical to staying on the cutting edge of innovation, but its benefits cannot be realized without a reliable data infrastructure that’s purpose-built for this era. lakeFS, which is optimized for performance on Arm compute platforms, plays an important role in helping us scale data operations and manage complex pipelines with confidence,” said Krzysztof Zylak, Director, Engineering Solution Architecture at Arm.
Arm’s engineering and data teams integrated lakeFS to address key pain points in the following ways:
Efficient Data Cleaning
By leveraging lakeFS’s automated garbage collection and data retention features, Arm can programmatically identify and remove outdated or redundant datasets. This significantly reduces unnecessary storage, helping to cut data retention costs.
Eliminated Data Duplication
lakeFS enables branch-based workflows where individual teams can work on isolated versions of a dataset without physically copying or duplicating data. This versioning model eliminates sync overhead and prevents costly data duplication, even in complex, multi-team environments.
Engineering Efficiency
The immutability and reproducibility guarantee provided by lakeFS allow teams to reuse the results of expensive computations. Coupled with automation around data cleanup, this has led to gains in developer productivity — freeing engineers from manual data management tasks and reducing risk of errors.
Data Governance
lakeFS introduces fine-grained version control across the entire data lifecycle. Teams can track changes, audit usage, and enforce retention and access policies systematically. This provides Arm with the governance clarity and process control needed to manage datasets collaboratively across departments and regions, while ensuring consistency and reproducibility in every environment.
Results
Arm realized measurable improvements across critical operational dimensions:
Faster Go-to-Market
Teams can access isolated, reproducible data sets quickly and safely — enabling faster experimentation and delivery cycles.
Accelerated Development Velocity
Engineers spend less time on manual cleanup and dataset configuration, allowing more focus on high-value innovation.
Significant Reduction in Storage Costs
Automated retention and duplication prevention capabilities drove major savings in data storage costs and operational overhead.
Improved Data Governance and Compliance
Versioned data workflows, auditable changes, and lifecycle enforcement brought greater control and security to data operations.
