Data Engineering

Data Engineering Project

How to Manage Your Data the Way You Manage Your Code

Einat Orr, PhD.
January 27, 2021

50 years ago it was very hard to collaborate over code. When developing large scale software projects it was difficult to manage changes to source code over time, as revision control tools were only starting to enter mainstream computing. The adoption of version control tools, first centralized and then distributed, changed all that, and now …

How to Manage Your Data the Way You Manage Your Code Read More »

Data Engineering

Diary of a Data Engineer

Oz Katz
December 30, 2020

Day 1: Finally, an easy one Got a pretty simple task for a change – read a new type of event stream generated by sales, and publish it to the data lake. Sounds like a straightforward ETL. I estimate this as one day of work. I can reuse a bunch of code from similar jobs …

Diary of a Data Engineer Read More »

Data Engineering

How to Pick the Right Postgres for your Application

Ariel Shaqed (Scolnicov)
September 11, 2020

Lots of applications require a Postgres database. Before you can install them, you will need a Postgres database. How do you pick the right Postgres for your application? There are a bewildering variety of possible ways to acquire a database running on a Postgres instance, but the biggest choice is “build or buy”: whether to …

How to Pick the Right Postgres for your Application Read More »

Data Engineering Project

The Quick Guide for Running Presto Locally on S3

Guy Hardonag
November 30, 2020

This post aims to cover our experience running Presto in a local environment with the ability to query Amazon S3 and other S3 Compatible Systems. We will: Describe the components needed and how to configure them. Provide a dockerized environment you could run. Show an example of running the provided environment and querying a publicly …

The Quick Guide for Running Presto Locally on S3 Read More »

Data Engineering

Data Versioning – Does it mean what you think it means?

Einat Orr, PhD.
November 30, 2020

When we first thought about a tagline for lakeFS, our recently released OSS project, we instinctively used terms such as “Data versioning”, “Manage data the way you manage code”, “It’s git for data”, and any random variation of the three that is a grammatically correct sentence in english. We were very pleased with ourselves for …

Data Versioning – Does it mean what you think it means? Read More »

LakeFS

  • Get Started
    Get Started