Many lakeFS customers in the aerospace, automotive, healthcare & life sciences, and manufacturing industries also are heavy users of MATLAB. lakeFS solves a range of data ops challenges for these organizations by serving as a “control plane” for AI-ready data – versioning complex data pipelines, tracking metadata and lineage, and enabling team collaboration through git-like operations for petabyte-scale multi-modal data. These same industries use MATLAB as their computational environment for data science, machine learning and simulations by leveraging its specialized toolboxes – collections of pretrained models, functions and apps that extend the core platform for specific technical computing tasks, like signal processing, image processing, and control systems.
While MATLAB provides powerful industry-specific tools for data scientists and ML practitioners in these industries, the lack of data version control creates inefficiencies across the data ops and ML ops lifecycle. Data discovery breaks down when teams maintain multiple dataset versions without clear provenance. Data bloat accumulates and iteration slows down because creating isolated environments means duplicating data and waiting for copies to complete. Data versioning of complex multi-modal data formats that combine large unstructured files with metadata, labels, and annotations becomes impossibly complex when multiple teams modify shared datasets, with no mechanism to merge changes or resolve conflicts. Reproducibility becomes impossible when the exact data state that trained a production model no longer exists or can’t be identified with certainty.
lakeFS addresses these challenges through Git semantics applied to data. Atomic commits create a single source of truth with complete lineage—every dataset state is tracked with metadata describing what changed and why. Zero-copy branching enables instant creation of isolated environments without storage duplication, making experimentation as fast as changing a pointer. Teams discover data through queryable commit history rather than filesystem archaeology. Collaboration works through explicit branch and merge operations, bringing the same conflict resolution and review processes from code to data. Reproducibility is guaranteed because every commit is immutable—checking out a six-month-old commit retrieves the exact data state, not an approximation.
Integrating lakeFS with MATLAB
Integrating MATLAB with lakeFS allows MATLAB users to take advantage of data version control from within the MATLAB environment. The integration works by wrapping lakeFS’s command-line tools – lakectl and Everest – in MATLAB classes that provide native MATLAB syntax for version control operations.
How the Integration Works
lakeFS provides two primary interfaces: lakectl (a CLI for version control operations) and Everest (a file system mount tool for direct data access). MATLAB can interact with both through its system() function, which executes shell commands and captures their output.
The integration consists of two MATLAB classes:
lakefs.m handles version control operations by wrapping lakectl commands. When you call a MATLAB method like lakefs.create_branch('my-repo', 'experiment-1', 'main'), the class constructs the equivalent lakectl command (lakectl branch create lakefs://my-repo/experiment-1 -s lakefs://my-repo/main), executes it via system(), and parses the output into MATLAB-friendly formats like tables or structs.
everest.m manages file system mounting by wrapping Everest commands. Methods like everest.mount('lakefs://repo/branch/', 'data/') execute the Everest mount command, wait for the mount to complete, and verify the mounted directory is accessible. This allows you to work with lakeFS data using standard MATLAB file I/O functions.
Here’s what a typical workflow looks like in MATLAB:
% Create an isolated branch for experimentation
lakefs.create_branch('ml-project', 'experiment-1', 'main');
% Mount the branch data
everest.mount('lakefs://ml-project/experiment-1/data/', 'local_data/');
% Work with data using standard MATLAB commands
data = load('local_data/training.mat');
model = train_model(data);
% Unmount
everest.umount('local_data/');
% Commit results with metadata
metadata = struct('accuracy', '0.94', 'training_time', '120.5');
lakefs.commit('ml-project', 'experiment-1', ...
'Experiment 1: baseline model', 'metadata', metadata);The key advantage of this approach is that it preserves MATLAB’s interactive workflow while adding version control capabilities. You don’t need to learn new tools or change how you write MATLAB code – the integration handles the translation between MATLAB and lakeFS.
Implementation Details
The wrapper classes handle several important details:
Command Construction: Methods take MATLAB-native inputs (strings, structs, key-value pairs) and construct properly formatted lakectl or Everest commands. This includes handling special characters, building complex flag combinations, and formatting metadata as JSON.
Output Parsing: lakectl returns data in table or log format. The wrapper classes parse this output into MATLAB tables, allowing you to filter, sort, and analyze version history using familiar MATLAB operations.
Error Handling: When operations fail, the wrappers provide clear error messages that explain what went wrong and how to fix it, rather than raw command-line errors.
Environment Management: The classes can switch between multiple lakeFS environments (local development, cloud production) by setting environment variables that lakectl reads, allowing seamless transitions without reconfiguring credentials.
Example: Tracking Experiments
Here’s a more complete example showing how the integration supports experiment tracking:
% List existing branches
branches = lakefs.list_branches('research-data');
disp(branches);
% Create experiment branch (zero-copy, instant)
lakefs.create_branch('research-data', 'experiment-2', 'main');
% View commit history
commits = lakefs.log('research-data', 'main', 'amount', 5);
for i = 1:height(commits)
fprintf('%s: %s\n', commits.id{i}(1:12), commits.message{i});
end
% Compare two data versions
diff = lakefs.diff('research-data', 'v1.0', 'v2.0');
fprintf('Changed files: %d\n', height(diff));
% Tag important milestones
lakefs.create_tag('research-data', 'paper-submission-2024', 'main');The integration makes these operations feel native to MATLAB. You don’t need to leave the MATLAB environment, manually construct CLI commands, or parse text output – the wrapper handles those details.
Reproducibility in Practice
The real power becomes clear when you need to reproduce past work. Consider this scenario: you trained a model six months ago and need to verify the results. With the integration:
% Find the commit used for training
commits = lakefs.log('ml-project', 'main', 'amount', 50);
training_commit = commits.id{find(contains(commits.message, 'production model v2.1'), 1)};
% Download exact data state from that commit
lakefs.download('ml-project', training_commit, ...
'data/training.mat', 'reproduced_data.mat');
% Load and verify
data = load('reproduced_data.mat');
fprintf('Training samples: %d\n', size(data.X, 1));
% Check metadata from that commit
commit_info = lakefs.log('ml-project', training_commit, 'amount', 1);
disp(commit_info.metadata{1});You get exactly the data that was used for training, along with complete metadata about how it was generated. This works because lakeFS tracks every commit immutably – the data state from six months ago is guaranteed to be identical to what it was then.
Getting Started
The MATLAB-lakeFS integration requires:
- lakeFS server (local or cloud instance)
- lakectl CLI installed and configured
- Everest for mounting capabilities
- MATLAB wrapper classes (lakefs.m and everest.m)
If you’re interested in using lakeFS with MATLAB, reach out to the lakeFS team for assistance with setup and configuration. The team can help you:
- Install and configure lakectl for your environment
- Set up Everest mounting with appropriate permissions
- Configure the MATLAB wrapper classes for your lakeFS instance
- Establish best practices for your specific workflows (ML training, regulatory compliance, team collaboration)


