Data Engineering

Data Engineering

Clearing the mess – How to ensure data quality with versioning

The lakeFS team
May 11, 2022

The last decade saw an unprecedented rise in the number of organizations that base their decisions and operations on data. The number of digital products that collect and process data and use it to fuel decision-making algorithms for enhancing future services is also growing at a very fast pace. That’s why data and data quality …

Clearing the mess – How to ensure data quality with versioning Read More »

Data Engineering

5 Painful mistakes data engineers make, and how to avoid them

The lakeFS team
June 6, 2022

Modern data engineering practices lead more and more organizations to a broader use of object stores. This happens due to the rising scale and complexity of the data that they manage – along with the growing variety of use cases that these data warehouses need to cater: from machine learning and algorithm development, to analytics …

5 Painful mistakes data engineers make, and how to avoid them Read More »

Data Engineering Hive Metastore

Takeaways From the Future of Metadata After Hive Metastore Roundtable

Paul Singman
May 11, 2022

Overview of Hive’s Metastore Let’s get right into it. This is not an objective recap of every topic covered at the Future of Metadata After Hive Roundtable last week. But it is a summary of what I found most interesting from the discussion between panelists Lior Ebel, Ryan Blue, Seshu Adunuthula and host Oz Katz. Watch the full talk below! …

Takeaways From the Future of Metadata After Hive Metastore Roundtable Read More »

Data Engineering

The Docker Everything Bagel™ – Spin Up A Local Data Stack

Paul Singman
May 24, 2022

Update Dec 16, 2021: Part II of the Everything Bagel series is published! Click here to read.  Introduction An important part of developing an open source project like lakeFS is assisting and advising our users. When they run into an issue and feel pain, we want to feel that pain, too. Quite literally. This means recreating …

The Docker Everything Bagel™ – Spin Up A Local Data Stack Read More »

Data Engineering

Making Sure Your Data Lifecycle Management Makes Sense

Paul Singman, Einat Orr, PhD.
May 11, 2022

Table of Contents What is Data Lifecycle Management Datasets are the foundational output of a data team. They do not appear out of thin air. No one has ever snapped their fingers and created an orders_history table. Instead, useful sets of data are created and maintained through a process that involves several predictable steps. Managing …

Making Sure Your Data Lifecycle Management Makes Sense Read More »

Data Engineering Integrations

Air & Water: The Airflow and lakeFS Integration

Itai Admi
May 27, 2021

Today we are excited to announce the official release of the lakeFS Airflow provider! What this package does is allow you to easily integrate lakeFS functionality to your Airflow DAGs. The library is published on PyPI so it can easily be installed in your project via the command: pip install airflow-provider-lakefs Once installed, you are …

Air & Water: The Airflow and lakeFS Integration Read More »

LakeFS

  • Get Started
    Get Started
  • lakeFS Cloud is live!

    Read the announcement
    +

    lakeFS Cloud
    is live!

    annopp-img