Paul Singman

Data Engineering

Towards Effective DataOps

Paul Singman
May 11, 2022

Gain the confidence to mess with your datawithout making a mess of your data. “If it hurts, do it more often.” is a wise piece of advice that DevOps engineers often repeat. Unless you are a masochist, following this advice will naturally lead you to finding ways to make the process being repeated less painful.  …

Towards Effective DataOps Read More »

Data Engineering

How Easy It Is to Re-use Old Pandas Code in Spark 3.2?

Paul Singman
May 11, 2022

In October, it was announced that the Pandas API was being integrated with Spark. This was particularly exciting news for a Pandas-baby like myself, whose first exposure to data analytics were Pandas-based notebook tutorials. Spark 3.2 has been out for several months now and a curiosity has been building inside me – how easy it is to …

How Easy It Is to Re-use Old Pandas Code in Spark 3.2? Read More »

Data Engineering Integrations

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino

Paul Singman
May 11, 2022

Introduction: Dockerize Your Data Pipeline I can remember times when my company started using a new technology — be it Redis, Kafka, or Spark — and in order to try it out I found myself staring at a screen like this: At the time I thought nothing of doing this. And even wore it as a badge of pride …

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino Read More »

Data Engineering

The Guide to Data Versioning

Paul Singman
May 31, 2022

“I have never lied to you, I have always told you some version of the truth.” “The truth doesn’t have versions, okay?” — Something’s Gotta Give (2003) Jack Nicholson and Diane Keaton discuss data versioning in Something’s Gotta Give. Table of Contents Introduction A version of something is defined as “a particular form in which some details are different …

The Guide to Data Versioning Read More »

Data Engineering Hive Metastore

Takeaways From the Future of Metadata After Hive Metastore Roundtable

Paul Singman
May 11, 2022

Overview of Hive’s Metastore Let’s get right into it. This is not an objective recap of every topic covered at the Future of Metadata After Hive Roundtable last week. But it is a summary of what I found most interesting from the discussion between panelists Lior Ebel, Ryan Blue, Seshu Adunuthula and host Oz Katz. Watch the full talk below! …

Takeaways From the Future of Metadata After Hive Metastore Roundtable Read More »

Integrations

dbt Tests – Create Staging Environments for Flawless Data CI/CD

Guy Hardonag, Paul Singman
May 11, 2022

Recently, we’ve heard from several community members experimenting with new development workflows using lakeFS and dbt. The timing isn’t surprising given dbt’s more recent support of big data compute tools like Spark and Trino that are some of the most commonly-used technologies by lakeFS users managing a data lake over an object store. The combination …

dbt Tests – Create Staging Environments for Flawless Data CI/CD Read More »

Project

lakeFS Community Call Recap – Oct. 2021

Paul Singman
December 2, 2021

Last week we held another lakeFS Community Call! We believe these calls are invaluable opportunities to have direct dialogue with our users on all things lakeFS. Oz covered important new lakeFS functionality, previewed what’s coming soon from the roadmap, and also shared two exciting updates from the community. Let’s recap! 6 Important lakeFS Releases 1. …

lakeFS Community Call Recap – Oct. 2021 Read More »

Data Engineering

3 Ways to Add Data to lakeFS

Paul Singman
May 11, 2022

Few people start using lakeFS without first having some data collected. Consequently, it is common that after getting it up and running, one of the first things people do is import their existing data to lakeFS. There isn’t a one-size-fits-all approach for doing this. Instead, there are ways that work great for a single file, …

3 Ways to Add Data to lakeFS Read More »

Project

lakeFS – Data Versioning at Scale

Paul Singman
March 24, 2022

If you think about it, lakeFS is about two things — version control and big data. We see ourselves as bringing version control to big data. This bridges a workflow gap that currently exists when working with data and working with code.  This gap is purely artificial — there’s no conceptual reason why different workflows should be required for …

lakeFS – Data Versioning at Scale Read More »

LakeFS

  • Get Started
    Get Started
  • lakeFS Cloud is live!

    Read the announcement
    +

    lakeFS Cloud
    is live!

    annopp-img