Data Architecture

Data Engineering

Clearing the mess – How to ensure data quality with versioning

The lakeFS team
May 11, 2022

The last decade saw an unprecedented rise in the number of organizations that base their decisions and operations on data. The number of digital products that collect and process data and use it to fuel decision-making algorithms for enhancing future services is also growing at a very fast pace. That’s why data and data quality …

Clearing the mess – How to ensure data quality with versioning Read More »

Data Engineering

5 Painful mistakes data engineers make, and how to avoid them

The lakeFS team
June 6, 2022

Modern data engineering practices lead more and more organizations to a broader use of object stores. This happens due to the rising scale and complexity of the data that they manage – along with the growing variety of use cases that these data warehouses need to cater: from machine learning and algorithm development, to analytics …

5 Painful mistakes data engineers make, and how to avoid them Read More »

Data Engineering Hive Metastore

Takeaways From the Future of Metadata After Hive Metastore Roundtable

Paul Singman
May 11, 2022

Overview of Hive’s Metastore Let’s get right into it. This is not an objective recap of every topic covered at the Future of Metadata After Hive Roundtable last week. But it is a summary of what I found most interesting from the discussion between panelists Lior Ebel, Ryan Blue, Seshu Adunuthula and host Oz Katz. Watch the full talk below! …

Takeaways From the Future of Metadata After Hive Metastore Roundtable Read More »

Integrations

dbt Tests – Create Staging Environments for Flawless Data CI/CD

Guy Hardonag, Paul Singman
May 11, 2022

Recently, we’ve heard from several community members experimenting with new development workflows using lakeFS and dbt. The timing isn’t surprising given dbt’s more recent support of big data compute tools like Spark and Trino that are some of the most commonly-used technologies by lakeFS users managing a data lake over an object store. The combination …

dbt Tests – Create Staging Environments for Flawless Data CI/CD Read More »

Data Engineering

Thoughts on the Future of the Databricks Ecosystem

Paul Singman
May 11, 2022

Databricks has come a long way since growing out of a Berkeley Lab in 2013 with an open-source distributed computing framework called Spark. Fast forward eight years and in addition to the core Spark product, there are a dizzying number of new features in various stages of public preview within the Databricks platform. In case …

Thoughts on the Future of the Databricks Ecosystem Read More »

LakeFS

  • Get Started
    Get Started
  • Join our live webinar on December 1st: Promote only high-quality data to production

    Register here
    +