Tal Sofer

Data Engineering Integrations

One Spark job, Many Data Sources – How to Easily Use lakeFS with Spark

Jonathan Rosenberg, Tal Sofer
August 15, 2022

lakeFS is an interface to the data lake, or the parts of the data lake one chooses to version control. The lakeFS interface is S3 compatible, and hence easily used with all common data applications, including Spark. In some cases, lakeFS is first adopted by the teams responsible for the data ingested to the lake, …

One Spark job, Many Data Sources – How to Easily Use lakeFS with Spark Read More »

Integrations Machine Learning

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman
November 21, 2022

Introducing Kubeflow and lakeFS Kubeflow¬†is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a …

Build Reproducible Experiments with Kubeflow and lakeFS Read More »

Data Engineering Project

Advancing lakeFS: Version Data At Scale With Spark

Tal Sofer
March 24, 2022

Combining lakeFS and Spark provides a new standard for scale and elasticity to distributed data pipelines. When integrating two technologies, the aim should be to expose the strengths of each as much as possible. With this philosophy in mind, we are excited to announce the beta release of the lakeFS FileSystem! This native Hadoop FileSystem …

Advancing lakeFS: Version Data At Scale With Spark Read More »

LakeFS

  • Get Started
    Get Started
  • Join our live webinar on December 1st: Promote only high-quality data to production

    Register here
    +