Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
Iddo Avneri
Iddo Avneri Author

Iddo has a strong software development background. He started his...

Last updated on April 26, 2024

Choosing the Right Version Control for Your Data Lake

lakeFS is an open-source project that brings Git-like version control mechanisms to your data lake. Many teams use the open-source solution available free of charge. However, some organizations might need additional features and expert support based on solid SLAs. 

This is where the lakeFS Cloud and Enterprise solutions come in. 

How do these two versions differ from one another? What is the right choice for your team and organization? Let’s break down the key differences to help you pick the best option for your needs.

What is lakeFS Cloud? 

lakeFS turns object storage buckets into data lake repositories with a Git-like interface, which lets you maintain and ensure data resilience and dependability inside the data lake over time. 

Despite the scalability and speed benefits of running a data lake on top of object stores, enforcing best practices, guaranteeing good data quality, and recovering swiftly from failures are exceedingly difficult. Specifically, the data input stage is crucial to guaranteeing the integrity of the service and data. This is where lakeFS and its data versioning capabilities come in.

lakeFS Cloud is a cloud-based version of lakeFS, available via the AWS or Azure Marketplaces.

Treeverse runs and manages a specialized lakeFS instance that communicates with data stored in your object storage, such as S3.

Benefits of lakeFS Cloud

  • Managed services – Users get access to features like automatic Garbage Collection and simplified maintenance.
  • Cost-effectiveness – The lakeFS Cloud version offers a lower total cost of ownership (TCO) and cost savings on compute resources thanks to the lakeFS server and the Garbage Collection feature.
  • Faster support – lakeFS can typically resolve issues faster within its own environment.
  • Lower Total Cost of Ownership (TCO) – The Cloud version handles all DevOps tasks related to lakeFS, reducing the team’s workload.

What is lakeFS Enterprise?

lakeFS Enterprise is an enterprise-ready lakeFS solution that supports Service Level Agreements (SLAs) as well as additional features compared to the open-source version of lakeFS. The additional features are:

  • Role-Based Access Control (RBAC)
  • Single Sign-On (SSO)
  • Support SLAs

When to choose lakeFS Enterprise

  • On-premises data – lakeFS Enterprise supports use cases where data resides on premises, outside the public cloud.

Deployment comparison: lakeFS Cloud vs. lakeFS Enterprise

lakeFS Cloud

This is essentially a hosted solution managed by Treeverse, the company behind the lakeFS open source project. It includes all the enterprise features and the benefits of SaaS, like automatic patching, monitoring, upgrades, and garbage collection.

lakeFS Enterprise

This solution runs on your own compute infrastructure. It includes everything you can find in the open-source version of lakeFS with several code-enabling features like Single Sign-On (SSO) and Role-Based Access Control (RBAC), as well as SLA support.

Common misconceptions about lakeFS Cloud

Data location

Both Enterprise and Cloud versions of lakeFS keep your data and metadata managed by lakeFS in your object or cloud storage (S3, Azure Blob Storage, Google Cloud Storage).

Data security

Regardless of version, data and metadata always reside within your control. lakeFS can even version data you can’t access directly using pre-signed URLs.

The verdict

For cloud-based data lakes, lakeFS Cloud offers a compelling solution with its managed service, cost-efficiency, and streamlined support. However, if on-premises data is your priority, lakeFS Enterprise provides the required control and features.

Whether you choose Enterprise or Cloud, Treeverse is here to help. Contact us to discuss your unique requirements for data version control!

Git for Data – lakeFS

  • Get Started
    Get Started
  • Where is data engineering heading in 2024? Find out in this year’s State of Data Engineering Report -

    Read it here