Data Version Control in R with lakeFS
How and why you should use lakeFS to provide data version control for your data lake[house] when working with R. Hands-on examples and code snippets.
How and why you should use lakeFS to provide data version control for your data lake[house] when working with R. Hands-on examples and code snippets.
Organizations need data governance for many reasons, not just to comply with a rising number of data privacy and protection rules, such as the GDPR of the European Union and the California Consumer Privacy Act (CCPA). A lack of it can cause more pain than a fine. One of the most impactful areas of data
In the traditional setup, organizations had a centralized infrastructure team responsible for managing data ownership across domains. But product-led companies started to approach this matter a little differently. Instead, they distribute the data ownership directly among producers (subject matter experts) using a data mesh architecture. This is a concept originally presented by Zhamak Dehghani in
A step by step guide to the lakeFS Cloud playground environment In this document, you will learn the quickest way to get started with lakeFS, utilizing the playground experience in lakeFS Cloud. Then I will cover how to connect your own storage to lakeFS, so you can run lakeFS against your own data. Step 1:
We are pleased to announce that lakeFS Cloud is now available as a self service on Azure. lakeFS Cloud is a fully-managed lakeFS platform, providing version control for your data lake. As well as being secure and scalable, it includes enterprise features such as Single Sign On (SSO), managed garbage collection, and role-based access control
The MLOps domain is spreading at an accelerating pace. In recent years, we’ve seen more ML products and MLOps tools than we probably need. Today, there are hundreds of tools trying to solve a bunch of problems in different ways, with some of them promising end-to-end solutions. This usually makes data practitioners confused when they
Guide to Enterprise Data Architecture Part 3 If you look at where companies keep their analytical data, you’ll quickly see that this space has split into two major architectural and technology stacks: data warehouses and data lakes. What are their defining characteristics? What factors should you take into account when choosing a data warehouse vs.
Data is a goldmine for every organization, no matter the industry. But to make the most of it, businesses need technology to maintain and manage transactional data like payments, inventory updates, and customer records. This is where OLTP databases come in. Online Transaction Processing (OLTP) databases are used to store and process large numbers of
If you work with a smaller dataset or do one-off jobs, the way you manage backfills isn’t that crucial. But what if you face constantly growing datasets with billions to trillions of records? Your backfilling data strategy will have a much bigger impact. When dealing with modern data pipelines on such a scale, it’s key
MLOps is mostly data engineering. As organizations ride past the hype cycle of MLOps, we realize there is significant overlap between MLOps and data engineering. As ML engineers, we spend most of our time collecting, verifying, pre-processing, and engineering features from data before we can even begin training models. Only 5% of developing and deploying
Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge
Our community is full of people with incredible skills and know-how. And this nomination proves us right! Our community member @Leonard Aukea has been nominated for Machine Learning Professional of the year as part of the Nordic DAIR Awards. Congratulations, Leonard! Who is Leonard? Leonard Aukea has been Heading Machine Learning Engineering and Operations at