Oz Katz

Data Engineering

Proudly announcing lakeFS Cloud

Einat Orr, PhD., Oz Katz
June 27, 2022

What is lakeFS? As data practitioners, we use many different terms to talk about what we do – we call it business intelligence, analytics, data pipelines, or insights. But there’s one term that captures what we do really well: delivering products.  When we were leading a large R&D organization, we couldn’t help but wonder about …

Proudly announcing lakeFS Cloud Read More »

Project

The lakeFS playground is now live and everybody can play!

Oz Katz, Michal Wosk
March 2, 2022

What if you could manage your data lake just like you manage code? With rollback, versioning, and branching capabilities on top of your existing data lake? lakeFS is an open-source project that provides a Git-like version control interface for data lakes, with seamless integration to most data tools and frameworks. lakeFS enables you to easily …

The lakeFS playground is now live and everybody can play! Read More »

Data Engineering

Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared

Oz Katz
March 26, 2022

Introduction When building a data lake, there is perhaps no more consequential decision than the format data will be stored in. The outcome will have a direct effect on its performance, usability, and compatibility. It is inspiring that by simply changing the format data is stored in, we can unlock new functionality and improve the …

Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared Read More »

Data Engineering Project

lakeFS Hooks: Implementing CI/CD for Data using Pre-merge Hooks

Oz Katz
March 2, 2021

Continuous integration of data is the process of exposing data to consumers only after ensuring it adheres to best practices such as format, schema, and PII governance. Continuous deployment of data ensures the quality of data at each step of a production pipeline. In this blog, I will present lakeFS’s web hooks, and showcase a …

lakeFS Hooks: Implementing CI/CD for Data using Pre-merge Hooks Read More »

Data Engineering

Chaos Data Engineering

Oz Katz
May 19, 2021

Modern Data Lakes are a complexity tar pit. They involve many moving parts: distributed computation engines, running on virtualized servers connected by a software defined network, running on top of distributed object stores, orchestrated by a distributed stream processor or pipeline execution engine. These moving parts fail. All the time. Handling these failures is not …

Chaos Data Engineering Read More »

Data Engineering Project

Introducing lakeview: A Visibility Tool for AWS S3 Based Data Lakes

Oz Katz
May 19, 2021

Lakeview is a new open source visibility tool for AWS S3 based data lakes. Think of it as ncdu, but for Petabyte-scale data. It’s goal is to provide you with an easy way to see the total size of your S3 bucket (prefix) storage. Instead of scanning billions of objects using the S3 API, which …

Introducing lakeview: A Visibility Tool for AWS S3 Based Data Lakes Read More »

LakeFS

  • Get Started
    Get Started
  • lakeFS Cloud is live!

    Read the announcement
    +

    lakeFS Cloud
    is live!

    annopp-img