Fast data loading for deep learning workloads with lakeFS Mount

Virtually mount a lakeFS remote repository from your object storage onto a local directory and interact with the data as if it resides on your local filesystem

Accelerate your ML/AI workloads with Mount

Use your favorite tools and libraries without disruption.

Reduce object storage roundtrips by 90%.

Accelerate your AI and ML workloads in no time!

Elevate your deep learning operations

Take any path from a lakeFS repository, on any branch, commit or tag, then mount and accelerate your AI/ML workloads!

Handle large scale data without changing work habits

Seamlessly scale from a few local files to millions without changing your tools or workflows. Use the same code from early experimentation all the way to production.

Improved data loading efficiency

lakeFS Mount handles the most demanding workloads, supporting billions of files and offers fast data fetching. Choose from multiple strategies, be they “lazy” or “eager” and optimize your GPU utilization.

Built-in machine learning data reproducibility

When mounting a path within a Git repo, Git will automatically track which version of the data got mounted, allowing code and input data to be linked together. Review any version of the data that the code was used on.

With lakeFS, we have streamlined data science and MLOps workflows, adapted data access controls for different data teams, accelerated productivity and reduced time-to-insights for ML engineering projects

lakeFS enabled us to streamline and run 200+ dbt models in production, increase data deployment velocity, efficiently reproduce ML experiments, increase productivity of the data teams, and adhere to FDA compliance requirements

lakeFS effectively helped our team overcome the reproducibility problem in an error-free and resilient manner. It also decreased duplication across several tests, helping to maintain data integrity.

Watch how lakeFS Mount works

Watch an in-depth walkthrough of how to get
started in this 7-minute tutorial

Frequently Asked Questions

How does lakeFS data versioning work?

lakeFS uses zero-copy branching to avoid data duplication. That is, creating a new branch is a metadata-only operation: no objects are actually copied. Only when an object changes does lakeFS create another version of the data in the storage.

Where is my data stored?

The data you wish to version control will stay in place on your object storage. Onboarding data to lakeFS is done by creating the lakeFS metadata for your existing data while the data stays in place. While writing new data using lakeFS, the bucket you define for lakeFS on your object storage will be used to store that data.

How do I get support for my lakeFS installation?

We are extremely responsive on our Slack channel, and we make sure to prioritize the most pressing issues for the community. For SLA-based support, please contact us at support@treeverse.io.

How do I get started with lakeFS Mount?

lakeFS Mount is available for lakeFS Enterprise (cloud and on-prem) customers. You first need to contact our team and once your setup is complete, you’ll receive the steps necessary to access the lakeFS Mount binary.

What operating systems are supported by lakeFS Mount?

lakeFS Mount supports Linux and MacOS. Windows support is on the roadmap.

How can I control access to my data when using lakeFS Mount?

You can use lakeFS’s existing Role-Based Access Control mechanism, which includes repository and path-level policies. lakeFS Mount translates filesystem operations into lakeFS API operations and authorizes them based on these policies.

What are the scale limitations of lakeFS Mount?

When using lakeFS Mount, the volume of data accessed by the local machine influences the scale limitations more than the total size of the dataset under the mounted prefix. This is because lakeFS Mount uses a lazy downloading approach, meaning it only downloads the accessed files. lakeFS Mount listing capability is limited to performing efficiently for prefixes containing fewer than 8000 objects, but we are working to increase this limit.

What are the recommended configurations for dealing with large datasets?

Ensure your cache size is large enough to accommodate the volume of files being accessed.

How does lakeFS Mount integrate with a Git repository?

It is perfectly safe to mount a lakeFS path within a Git repository. lakeFS Mount prevents git from adding mounted objects to the git repository (i.e when running git add -A) by adding a virtual .gitignore file to the mounted directory.

The .gitignore file will also instruct Git to ignore all files except .everest/source and in its absence, it will try to find a .everest/source file in the destination folder, and read the lakeFS URI from there. Since .everest/source is in source control, it will mount the same lakeFS commit every time!

Request access to a
lakeFS Mount token

Our creators and solution architects are happy to
demonstrate how lakeFS works and answer any
question that you may have

We’re also here

Contact information