lakeFS Blog

Announcements Integrations

lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI

Oz Katz
February 5, 2023

We’re happy to introduce experimental support for running SQL queries, directly on objects (your files!) in lakeFS. All through the lakeFS UI. No need to install or configure anything. TLDR – You can now run SQL queries on Parquet and other tabular formats, directly from the lakeFS UI! Explore data, look at its schema, compare …

lakeFS ❤️ DuckDB: Embedding an OLAP database in the lakeFS UI Read More »

Thought Leadership

4 New Year Resolutions for Data Engineering Leaders and How You Can Achieve Them

Michal Wosk
December 11, 2022

Introduction As 2022 is about to end, many engineering leaders use this time to reflect on the year that has passed and start planning ahead on the year upcoming. Data engineering teams are usually swamped with tasks and requirements, which are many times also accompanied by failures and issues that require immediate attention. Therefore using …

4 New Year Resolutions for Data Engineering Leaders and How You Can Achieve Them Read More »

Integrations Use Cases

Troubleshoot and Reproduce Data with Apache Airflow

Iddo Avneri
December 6, 2022

Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge …

Troubleshoot and Reproduce Data with Apache Airflow Read More »

Best Practices Data Engineering

CI/CD for data pipelines – The Shortest Path to Your Destination with lakeFS

The lakeFS team
February 7, 2023

Overview Continuous integration (CI) of data is the process of exposing data to consumers only after ensuring it adheres to best practices such as format, schema, and PII governance. Continuous deployment (CD) of data ensures the quality of data at each step of a production pipeline. These approaches are commonly used by application developers of …

CI/CD for data pipelines – The Shortest Path to Your Destination with lakeFS Read More »

Best Practices Data Engineering

Data Version Control – A Data Engineering Best Practice You Must Adopt

Einat Orr, PhD.
January 3, 2023

Imagine the software engineering world before distributed version control systems like Git became widespread. This is where the data world is currently at. The explosion in the volume of generated data forced organizations to move away from relational databases and instead store data in object storage. This escalated manageability challenges that teams need to address …

Data Version Control – A Data Engineering Best Practice You Must Adopt Read More »

Data Engineering

Data Reproducibility and other Data Lake Best Practices

The lakeFS team
January 16, 2023

Overview Data changes frequently, making the task of keeping track of its exact state over time difficult. Oftentimes, people maintain only one state of their data––its current state. Data lake best practices require reproducibility that lets us time travel between different versions of the data, enabling us a snapshot at the data at different times …

Data Reproducibility and other Data Lake Best Practices Read More »

Community

lakeFS Community: Leonard Aukea nominated for Machine Learning Professional of the year!

Adi Polak
November 20, 2022

Our community is full of people with incredible skills and know-how. And this nomination proves us right! Our community member @Leonard Aukea has been nominated for Machine Learning Professional of the year as part of the Nordic DAIR Awards. Congratulations, Leonard!  Who is Leonard? Leonard Aukea has been Heading Machine Learning Engineering and Operations at …

lakeFS Community: Leonard Aukea nominated for Machine Learning Professional of the year! Read More »

Git for Data – lakeFS

  • Get Started
    Get Started
  • LIVE: Develop Spark pipelines against production data on February 15 -

    Register Now
    +