lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data
Fast data loading for deep learning workloads with lakeFS Mount
Virtually mount a lakeFS remote repository from your object storage onto a local directory and interact with the data as if it resides on your local filesystem
lakeFS enabled us to streamline and run 200+ dbt models in production, increase data deployment velocity, efficiently reproduce ML experiments, increase productivity of the data teams, and adhere to FDA compliance requirements
Elevate your deep learning operations
Take any path from a lakeFS repository, on any branch, commit or tag, then mount and accelerate your AI/ML workloads!
Handle large scale data without changing work habits
Seamlessly scale from a few local files to millions without changing your tools or workflows. Use the same code from early experimentation all the way to production.
Improved data loading efficiency
lakeFS Mount handles the most demanding workloads, supporting billions of files and offers fast data fetching. Choose from multiple strategies, be they “lazy” or “eager” and optimize your GPU utilization.
Built-in machine learning data reproducibility
When mounting a path within a Git repo, Git will automatically track which version of the data got mounted, allowing code and input data to be linked together. Review any version of the data that the code was used on.
How to get started with lakeFS Mount
- Request access
Request access to lakeFS Mount and get a token
- Schedule onboarding
Schedule an onboarding session to mount a path to your local directory
- Start mounting
Accelerate your AI and ML workloads in no time!
Watch how lakeFS Mount works
Watch an in-depth walkthrough of how to get
started in this 7-minute tutorial
Frequently Asked Questions
lakeFS uses zero-copy branching to avoid data duplication. That is, creating a new branch is a metadata-only operation: no objects are actually copied. Only when an object changes does lakeFS create another version of the data in the storage.
The data you wish to version control will stay in place on your object storage. Onboarding data to lakeFS is done by creating the lakeFS metadata for your existing data while the data stays in place. While writing new data using lakeFS, the bucket you define for lakeFS on your object storage will be used to store that data.
We are extremely responsive on our Slack channel, and we make sure to prioritize the most pressing issues for the community. For SLA-based support, please contact us at support@treeverse.io.
lakeFS Mount is available for lakeFS Enterprise (cloud and on-prem) customers. You first need to contact our team and once your setup is complete, you’ll receive the steps necessary to access the lakeFS Mount binary.
lakeFS Mount supports Linux and MacOS. Windows support is on the roadmap.
You can use lakeFS’s existing Role-Based Access Control mechanism, which includes repository and path-level policies. lakeFS Mount translates filesystem operations into lakeFS API operations and authorizes them based on these policies.
When using lakeFS Mount, the volume of data accessed by the local machine influences the scale limitations more than the total size of the dataset under the mounted prefix. This is because lakeFS Mount uses a lazy downloading approach, meaning it only downloads the accessed files. lakeFS Mount listing capability is limited to performing efficiently for prefixes containing fewer than 8000 objects, but we are working to increase this limit.
Ensure your cache size is large enough to accommodate the volume of files being accessed.
It is perfectly safe to mount a lakeFS path within a Git repository. lakeFS Mount prevents git from adding mounted objects to the git repository (i.e when running git add -A) by adding a virtual .gitignore file to the mounted directory.
The .gitignore file will also instruct Git to ignore all files except .everest/source and in its absence, it will try to find a .everest/source file in the destination folder, and read the lakeFS URI from there. Since .everest/source is in source control, it will mount the same lakeFS commit every time!
Request access to a
lakeFS Mount token
Our creators and solution architects are happy to
demonstrate how lakeFS works and answer any
question that you may have
-
Mount a lakeFS repo from your object storage
as a local filesystem -
Work concurrently with lakeFS and your favorite tools
and libraries without any disruption - Handle the most demanding workloads in no time!
We’re also here