lakeFS Blog

Data Engineering

Towards Effective DataOps

Paul Singman
May 11, 2022

Gain the confidence to mess with your datawithout making a mess of your data. “If it hurts, do it more often.” is a wise piece of advice that DevOps engineers often repeat. Unless you are a masochist, following this advice will naturally lead you to finding ways to make the process being repeated less painful.  …

Towards Effective DataOps Read More »

Data Engineering

Clearing the mess – How to ensure data quality with versioning

The lakeFS team
May 11, 2022

The last decade saw an unprecedented rise in the number of organizations that base their decisions and operations on data. The number of digital products that collect and process data and use it to fuel decision-making algorithms for enhancing future services is also growing at a very fast pace. That’s why data and data quality …

Clearing the mess – How to ensure data quality with versioning Read More »

Data Engineering

5 Painful mistakes data engineers make, and how to avoid them

The lakeFS team
May 11, 2022

In today’s world of data engineering, we need to store more than just simple text information in relational or non-relational databases, tables or documents. Data formats include email, images, video, web pages, audio files, datasets, sensor data and other types of media content. Basically, a big chunk of unstructured data.  Studies have shown that somewhere …

5 Painful mistakes data engineers make, and how to avoid them Read More »

Project

The lakeFS playground is now live and everybody can play!

Oz Katz, Michal Wosk
March 2, 2022

What if you could manage your data lake just like you manage code? With rollback, versioning, and branching capabilities on top of your existing data lake? lakeFS is an open-source project that provides a Git-like version control interface for data lakes, with seamless integration to most data tools and frameworks. lakeFS enables you to easily …

The lakeFS playground is now live and everybody can play! Read More »

Data Engineering

Closing the Gap: Lifecycle Management for Data Products

Einat Orr, PhD.
March 7, 2022

As data practitioners, we use many different terms to talk about what we do – we call it business intelligence, analytics, data pipelines, or insights. But there’s one term that captures what we do really well: delivering products. When I was leading a 200 person engineering team at SimilarWeb, I couldn’t help but notice about …

Closing the Gap: Lifecycle Management for Data Products Read More »

Data Engineering

Level Up Your Data Lake

Paul Singman
May 11, 2022

What is the Basic Data Lake? A data lake is primarily two things: an object store and the objects being stored. It might look something like this: Even with this basic setup, your data is in a good position to support all three of the main use cases for data: 1. BI Analytics 2. Data-Intensive APIs …

Level Up Your Data Lake Read More »

Data Engineering

How Easy It Is to Re-use Old Pandas Code in Spark 3.2?

Paul Singman
May 11, 2022

In October, it was announced that the Pandas API was being integrated with Spark. This was particularly exciting news for a Pandas-baby like myself, whose first exposure to data analytics were Pandas-based notebook tutorials. Spark 3.2 has been out for several months now and a curiosity has been building inside me – how easy it is to …

How Easy It Is to Re-use Old Pandas Code in Spark 3.2? Read More »

Project

In the Realm of the New Open Data Stack: Joining the lakeFS Adventure

Adi Polak
May 11, 2022

Trends in the data industry, how lakeFS fits into the new data stack, and a personal story of why I chose to join lakeFS. Open Source Is Everywhere! For the past decade, open-source has made a home in the ever-evolving data stack zoo known as the Hadoop ecosystem. To keep everything in order, ZooKeeper was …

In the Realm of the New Open Data Stack: Joining the lakeFS Adventure Read More »

LakeFS

  • Get Started
    Get Started