Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
Oz Katz
Oz Katz Author

Oz Katz is the CTO and Co-founder of lakeFS, an...

Last updated on July 3, 2024

I am happy to share that lakeFS Mount is now available (currently released in private preview). lakeFS Mount allows for mounting a repository (or a specific path within one) as a local filesystem. 

What is a mount? 

A filesystem mount is the ability to present a local device or a remote location as a local directory. It is a basic feature provided by all operating systems and is widely used by system admins and developers.

Mount for object storage

Mounting an object store location as a POSIX directory isn’t a novel concept (see S3 Mountpoint, gcs-fuse and blobfuse). These tools provide an abstraction layer on top of the object store that allows it to appear as a directory on the machine. Reading and writing objects behaves exactly like reading and writing files from a local drive. There are several reasons why this is beneficial:

  1. Ease of integration: Typically, interacting with an object store requires an SDK and custom code to handle network calls, authentication and configuration. On the other hand, reading and writing to local files is ubiquitous and supported on pretty much any framework, tool or language
  2. Compatibility: This approach allows developers to implement their logic once, to work with a local directory and then switch that directory out for a mounted object store when required.
  3. Separation of concerns: Wherein a data scientist can worry about business logic and less so about IO scalability, while software developers and operators can take that logic and apply at larger scale by simply replacing the input directory with a mount.

However, while these benefits are real, existing object storage mount solutions often fall short when it comes to performance and consistency, especially when used in the context of machine learning and deep learning environments. In this article we will review the reasons existing object storage mount solutions fall short, and why lakeFS, as a data version control system, is best positioned to provide a performant and consistent object storage mount. 

Why mounting object storage typically leads to poor performance

Applications (and libraries) expect file system metadata operations to be very cheap

Here’s a quick example: see this small bit of code, commonly found in ML applications:

import tensorflow as tf

dataset = tf.data.Dataset.list_files("/path/*.txt")

Let’s run this through strace to see what this does:

strace --summary -f -e stat  -- python3 ./load_dataset.py
...
[pid 274851] +++ exited with 0 +++
+++ exited with 0 +++
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    2.048897          31     64344           stat
------ ----------- ----------- --------- --------- ----------------
100.00    2.048897          31     64344           total

Looking at the summary below, we see that our one line of Python actually triggered 64,344 filesystem operations only to look up metadata.

Now, imagine every such operation has to be translated into an HTTP call with ~10-50 ms of latency (vs the current average of 31 microseconds). This could substantially slow down our script even before we read a single byte of actual data.

Some implementations (such as S3 Mountpoint) get over this by allowing the user to sacrifice some consistency for performance, by caching metadata responses

This alleviates some of that latency, but not all of it – many applications will still attempt to stat each file (or readdir every directory), expecting this operation to be very cheap – caching will help by only having to do this once per file, but every file encountered for the first time will still incur an unavoidable round trip.

(Deep learning) applications will typically read the same file many, many times

Training deep neural networks requires passing the same input multiple times as the network is being trained, so it is often actually bottlenecked on I/O instead of those expensive GPUs. In reality, this translates to the same files being accessed thousands of times in relatively quick succession. Feeding the GPU optimally depends not only on the overall throughput (for which object stores excel) but also latency. 

Especially with smaller files, object stores are notorious for having a relatively long TTFB, sometimes reaching dozens of milliseconds. While this is OK for smaller datasets, at high throughput, we care about latency – remember Little’s law which states:

L=λW

Where L is the number of requests that are being processed, λ is the client QPS (query per second) and W is the average latency for a client request to be processed.

From this, we can deduce that in order to achieve high throughput, we really should care about our average latency, and aim to reduce it.

How lakeFS Mount optimizes deep learning workloads

lakeFS Mount allows for mounting a repository – or a specific path within one – as a local filesystem. When mounting, the everest command line utility spins up a local Fuse or NFS server (listening on localhost) and mounts that to a local path in the operating system:

how lakeFS Mount works

For this post, I’ll focus on read-only mounts, referring to a specific lakeFS commit.

Sub-millisecond metadata operations

The first and major improvement utilized by lakeFS Mount is its ability to efficiently prefetch a commit’s metadata onto a local cache dir. This leverages lakeFS’ architecture and data model: commits in lakeFS are represented as a set of pointers to data on an object store.

Each pointer is a key/value pair: The key is the logical path within the lakeFS repository (for example: ”data/file.parquet”) and the value is a structure with the following attributes:

size Size of the object in bytes
mtime Last modification date
physical_address Location of the actual data file on the object store
user_metadata Additional attributes, user controlled
identity A collision resistant identifier representing this object (based on path, size, etag and other attributes)

These key value pairs are stored as immutable chunks (1-8MB in size) – each one representing some lexicographical range within a commit. They are stored in rocksdb-compatible sstable files, which are named by the hash of all included identities.

These range files are then referenced by a “meta-range” – a special type of range that points to other range files, constructing a Merkle tree.

store file system metadata with lakeFS range files
lakeFS range files (on the right), efficiently storing file system metadata)

With this layout, lakeFS Mount can pre-fetch these range files very efficiently:

lakeFS Mount can pre-fetch range files efficiently

Once pre-fetched, the local mount server is then able to satisfy all filesystem metadata operations (such as stat and readdir) directly from these sstables. This is a very fast lookup – sstables tend to be relatively compact and are optimized for random access reads – many orders of magnitude faster than an object store lookup.

Data prefetching and caching

Caching data is notoriously hard to get right, especially when we care about consistency and reproducibility. As the famous saying goes, There are only two hard things in Computer Science: cache invalidation and naming things.

There are only two hard things in Computer Science: cache invalidation and naming things

However, lakeFS commits, along with their meta-range and ranges – are guaranteed to be immutable! Furthermore, each file within a said commit has an identity; if we store files in cache based on that identity, we can also reuse a cached object across commits without sacrificing consistency. This means there’s no invalidation to worry about – our eviction algorithm only has to take care of maintaining the most frequently accessed objects in storage.

lakeFS Mount implements this using a read-through cache: when objects are requested by the operating system, the mount server will first look them up in the cache dir based on their identity. If it is not found, the file will be fetched from the remote object store into the cache dir and then served from there. 

This is both simple and effective – subsequent reads and seeks happen locally.

The second part to the story is optimally utilizing that cache – most workloads are pretty deterministic, so we can anticipate with high accuracy which objects are likely to be accessed. In some cases, it could be beneficial to pre-fetch them before processing begins. For this, lakeFS Mount allows granular pre-fetching, not only for metadata as described above, but also for data files! Here are a few examples where prefetching could greatly improve performance:

Getting started with lakeFS Mount

Prerequisites: 

  1. A working lakeFS Server running either lakeFS Enterprise or lakeFS Cloud
  2. You’ve installed the lakectl command line utility: this is the official lakeFS command line interface, on top of which lakeFS Mount is built. 
  3. lakectl is configured properly to access your lakeFS server as detailed in the configuration instructions

To use lakeFS Mount, request a private preview to download and install the everest command line utility, currently available for MacOS and Linux.

Mounting a path to a local directory:

$ everest mount lakefs://repository/reference/path/ ./my_local_directory

Once complete, my_local_directory should be mounted with the specified path.

To unmount the directory, simply run:

everest umount ./my_local_directory

Which will unmount the path and terminate the local mount-server.

Of course, there are a lot of knobs that we can turn here to improve I/O efficiency – prefetching parallelism, usage of pre-signed URLs, cache directory size and location, and many others. For a complete list, visit the lakeFS Mount official documentation.

What’s Next?

A lot! 

lakeFS Mount is still in private preview, expected to be in GA by the end of Q3 2024. 

On its roadmap for the second half on the year:

  • Granular data pre-fetching strategies
  • Native Kubernetes support via a CSI Driver
  • Write-support (by branching out of the mounted commit and applying changes in isolation!)

This is only a partial list, of course. If you have a use case for lakeFS Mount and would like to help shape its future, we’re actively looking for design partners to help us turn it into the highest performance way to use object stores available, for deep learning, data science and data engineering use cases.

Git for Data – lakeFS

  • Get Started
    Get Started
  • Did you know that lakeFS is an official Databricks Technology Partner? Learn more about -

    lakeFS for Databricks
    +