Oz Katz

Data Engineering Project

Introducing lakeview: A Visibility Tool for AWS S3 Based Data Lakes

Oz Katz
October 12, 2020

Lakeview is a new open source visibility tool for AWS S3 based data lakes. Think of it as ncdu, but for Petabyte-scale data. It’s goal is to provide you with an easy way to see the total size of your S3 bucket (prefix) storage. Instead of scanning billions of objects using the S3 API, which …

Introducing lakeview: A Visibility Tool for AWS S3 Based Data Lakes Read More »

Data Engineering

Diary of a Data Engineer

Oz Katz
September 15, 2020

Day 1: Finally, an easy one Got a pretty simple task for a change – read a new type of event stream generated by sales, and publish it to the data lake. Sounds like a straightforward ETL. I estimate this as one day of work. I can reuse a bunch of code from similar jobs …

Diary of a Data Engineer Read More »

LakeFS

  • Get Started
    Get Started