When integrating two technologies, the aim should be to expose the strengths of each as much as possible.
With this philosophy in mind, we are excited to announce the beta release of the lakeFS FileSystem! This native Hadoop FileSystem implementation allows for Spark applications on lakeFS to realize the best of both worlds. Spark workers can utilize their full capacity for distributed data operations, while lakeFS provides versioning capability to large-scale datasets.
In this article, we will explain what the lakeFS Filesystem is, how it works, and how you can use it in your own Spark applications!