Oz Katz

Data Engineering

Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared

Oz Katz
April 13, 2021

Introduction When building a data lake, there is perhaps no more consequential decision than the format data will be stored in. The outcome will have a direct effect on its performance, usability, and compatibility. It is inspiring that by simply changing the format data is stored in, we can unlock new functionality and improve the …

Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared Read More »

Data Engineering Project

lakeFS Hooks: Implementing CI/CD for Data using Pre-merge Hooks

Oz Katz
March 2, 2021

Continuous integration of data is the process of exposing data to consumers only after ensuring it adheres to best practices such as format, schema, and PII governance. Continuous deployment of data ensures the quality of data at each step of a production pipeline. In this blog, I will present lakeFS’s web hooks, and showcase a …

lakeFS Hooks: Implementing CI/CD for Data using Pre-merge Hooks Read More »

Data Engineering

Chaos Data Engineering

Oz Katz
March 24, 2021

Modern Data Lakes are a complexity tar pit. They involve many moving parts: distributed computation engines, running on virtualized servers connected by a software defined network, running on top of distributed object stores, orchestrated by a distributed stream processor or pipeline execution engine. These moving parts fail. All the time. Handling these failures is not …

Chaos Data Engineering Read More »

Data Engineering Project

Introducing lakeview: A Visibility Tool for AWS S3 Based Data Lakes

Oz Katz
March 8, 2021

Lakeview is a new open source visibility tool for AWS S3 based data lakes. Think of it as ncdu, but for Petabyte-scale data. It’s goal is to provide you with an easy way to see the total size of your S3 bucket (prefix) storage. Instead of scanning billions of objects using the S3 API, which …

Introducing lakeview: A Visibility Tool for AWS S3 Based Data Lakes Read More »

Data Engineering

Diary of a Data Engineer

Oz Katz
March 8, 2021

A glimpse into the life of a data engineer. Day 1: Finally, an easy one Got a pretty simple task for a change – read a new type of event stream generated by sales, and publish it to the data lake. Sounds like a straightforward ETL. I estimate this as one day of work. I …

Diary of a Data Engineer Read More »

LakeFS

  • Get Started
    Get Started