Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros
Oz Katz
Oz Katz Author

Oz Katz is the CTO and Co-founder of lakeFS, an...

Last updated on August 20, 2024

We are excited to announce the launch of lakeFS Mount, a powerful new lakeFS client designed to simplify your data workflows. 

lakeFS Mount allows you to mount a lakeFS repository (or a path within one) as a local directory on any workstation or server, bringing unprecedented ease and efficiency to your data operations.  

But what exactly does mounting mean? Mounting a file system refers to making the data stored in a remote location (like an object store) appear as if it’s part of the local file system, enabling seamless access without installing and configuring SDKs and writing custom data loading code.

 mount a lakeFS repository

Who is lakeFS Mount for?

lakeFS Mount is tailor-made for data scientists and machine learning practitioners. Whether you are prototyping models, running complex experiments, or training large models from scratch, lakeFS Mount is here to make your life easier. Let’s take a look at 3 common benefits in more detail.

Simplifying Workflows with Seamless Integration

One of the standout features of lakeFS Mount is its ability to integrate smoothly with your existing code and workflows. There’s no need for extensive modifications or rewrites. By simply mounting a lakeFS repository, any existing code that can read and write files can now access lakeFS. This means you can continue using your favorite tools and libraries without any disruption.

In practice, this means that most machine learning projects can continue to scale all the way from ideation and early experimentation, where small datasets are used in a local directory, all the way to production – where large, distributed storage is required, all using the exact same code. 

This reduces the “it worked on my machine” type of surprises as things move from development to production. In this scenario, code has to change to work with more complex forms of storage – not to mention cases where the libraries used simply do not support the required object store interface (or do so with poor performance).

Performance Optimized for the Demanding Data Scientist

While lakeFS Mount is incredibly easy to use, it doesn’t compromise on performance. It employs advanced I/O patterns such as:

  • Metadata Prefetching: Leverage lakeFS’ efficient metadata storage to avoid expensive server round trips for listing and stating files
  • Content-Addressable File Caching: Efficiently caches data based on its lakeFS identity to allow quick random access
  • Lazy Fetching: Only fetches data when it’s actually needed, optimizing both speed and resource usage

These optimizations ensure that lakeFS Mount can handle the most demanding workloads, preventing expensive GPUs from being bottlenecked by object store access times during training runs. Read more about how lakeFS Mount optimizes for high performance and deep learning workloads.

Accelerating Development and Production Workflows

With lakeFS Mount, the burden of integrating with external data sources is a thing of the past. This frees you to focus on what you do best: building and deploying innovative machine learning models. In production, lakeFS Mount’s performance optimizations ensure that your models run efficiently, making the most of your hardware investments.

Not only that, but lakeFS Mount has reproducibility built-in. When mounting a path within a Git repository, Git will automatically track which version of the data got mounted, allowing code and input data to be linked together.  Simply checkout an older version of the code, and you’ll automatically get the corresponding version of the data that code was used on! 

Watch this quick tutorial to see how this works in practice:

Getting Started with lakeFS Mount

Want to try lakeFS Mount? Grab your access token here.

lakeFS