Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
Paul Singman
Paul Singman Author

August 25, 2021
logo 10

Everything Bagel is now lakeFS-samples

You can find Everything Bagel, along with lots of amazing hands-on examples for using lakeFS, in the lakeFS-samples repository.


An important part of developing an open source project like lakeFS is assisting and advising our users. When they run into an issue and feel pain, we want to feel that pain, too. Quite literally.

This means recreating the environment, running the same code, and raising the same error.

In complex, modern data stacks this is easier said than done. Developed from experience over the past year, we have a setup that helps us in this pursuit. Affectionately, it is referred to as the Everything Bagel.

What is the Everything Bagel?

The Everything Bagel is a multi-container Docker environment that spins up locally with a single command. It contains many of the technologies we see lakeFS commonly deployed with, including:

  • Spark
  • Hive
  • Trino
  • MinIO 

The best part – it’s publicly available right in the lakeFS GitHub repo!

In this post I’ll cover how to get the Docker Everything Bagel up and running on your own laptop. In the process, I’ll also cover how it works and some of the cool things you can do with your very own Everything Bagel.

docker everything bagel diagram
Diagram of the Everything Bagel Environment

Spin It Up!

The only pre-requisite is to have Docker installed on your machine. Once installed, the steps to get the Everything Bagel running are as follows:

  1. Clone the lakeFS repo: git clone
  2. Navigate to the deployments/compose directory:  cd deployments/compose
  3. Run:  docker compose up -d

That’s it! The different containers will start to spin up. Once it completes (will take a few min the first time) you can check the status of the resulting containers by running docker compose ps. You should see a response in Terminal like:

NAME                     COMMAND                   SERVICE             STATUS              PORTS
compose_spark-worker_1   "/opt/bitnami/script…"    spark-worker        running   >8081/tcp
compose_spark-worker_2   "/opt/bitnami/script…"    spark-worker        running   >8081/tcp
compose_spark-worker_3   "/opt/bitnami/script…"    spark-worker        running   >8081/tcp
compose_spark_1          "/opt/bitnami/script…"    spark               running   >8080/tcp, :::18080->8080/tcp
hive                     "/bin/sh -c \"/entryp…"   hive-metastore      running   >9083/tcp, :::9083->9083/tcp
hiveserver2              "hive --service hive…"    hive-server         running
lakefs                   "/app/wait-for postg…"    lakefs              running   >8000/tcp, :::8000->8000/tcp
lakefs-setup             "/app/wait-for postg…"    lakefs-setup        exited (0)
mariadb                  "docker-entrypoint.s…"    mariadb             running             3306/tcp
minio                    "minio server /data …"    minio               running   >9000/tcp, :::9000->9000/tcp,>9001/tcp, :::9001->9001/tcp
minio-setup              "mc mb lakefs/example"    minio-setup         exited (0)
postgres                 "docker-entrypoint.s…"    postgres            running             5432/tcp
trino                    "/usr/lib/trino/bin/…"    trino               running             8080/tcp

A Quick Note on Memory

When creating the 10+ Everything Bagel containers, it’s important to make sure the Docker application has enough memory allocated to it.

To adjust this setting, go to Docker’s preferences page. From there, go to the Resources tab and make sure the Memory setting is set to at least 4 GB.

How It Works

The key to understanding how the Docker Everything Bagel works is to look at the docker-compose.yml file that the docker compose up -d command (the -d flag means run  in “detached” state) by default references.

Although this article won’t be a comprehensive review of Docker Compose files, let’s take a peek at a few of the service sections to understand what’s happening.

Let’s look at part of the config.

version: "3"

    image: treeverse/lakefs:latest
    container_name: lakefs-setup
      - postgres
      - minio-setup
      - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
      - LAKEFS_DATABASE_CONNECTION_STRING=postgres://lakefs:lakefs@postgres/postgres?sslmode=disable
      - LAKECTL_SERVER_ENDPOINT_URL=http://lakefs:8000
    entrypoint: ["/app/wait-for", "postgres:5432", "--", "sh", "-c",
      "lakefs setup --user-name docker --access-key-id AKIAIOSFODNN7EXAMPLE --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY && lakectl repo create lakefs://example s3://example"

    image: minio/mc
    container_name: minio-setup
        - MC_HOST_lakefs=http://minioadmin:minioadmin@minio:9000
      - minio
    command: ["mb", "lakefs/example"]
    image: postgres:11
    container_name: postgres
      POSTGRES_USER: lakefs

This snippet shows sections that create containers running postgres (a lakeFS dependency), MinIO (underlying object store) and lakeFS.

Docker Compose Specifics

Within a service’s section, perhaps the most important setting is the image.

This specifies the base image used in building the container. In most cases, you can find an official image for a service on Docker Hub that installs all the packages you need to run the service. For example, the lakeFS image on Docker Hub is updated automatically by a GitHub Action workflow every time there’s a new official release of lakeFS and starts running lakeFS automatically when started.

Next, the depends_on: key is used to control the order in which containers are created. Before starting the lakeFS service we make sure a container running postgres is up first, as well as a service that starts MinIO and creates a bucket. If you aren’t familiar with MinIO, it’s an open-source object store that maintains compatibility with the S3 API. This makes it convenient for simulating S3 in local environments as we are doing here.

Any environment variables can be listed within the environment: key, as shown in all three services above. Similarly we can save entire config files and copy them into the container as a volume. The spark service contains an example of this.

The last thing I’d like to to cover is the entrypoint: key. This is the command or commands we want to run inside the container. Usually this simply starts the relevant service. In the lakeFS-setup service however, we perform the additional steps of creating a lakeFS user and repository (via the lakectl command line tool) so these aren’t steps that need to be performed manually each time.

After all, there is little value in a repository-less lakeFS instance.

Correctly setting these settings lets us run pretty much any service we want inside an isolated Docker environment. Pretty cool!

Using the Everything Bagel

Once you have the Everything Bagel up and running, you are capable of doing a variety of things, such as…

  1. Connecting to the Hive and Trino [docker compose — profile client run — rm trino-client] clients and creating tables or querying data
  2. Hopping into the spark container [docker-compose exec spark bash] and testing out spark-submit jobs
  3. Logging into the lakeFS (http://localhost:8000) and MinIO (http://localhost:9000) UIs to see how Spark, Hive, and Trino’s operations are reflected.

Try different things out, let us know how it goes!

Looking Ahead

This is an introduction to the Bagel. I hope you learned a bit about Docker and how it can be used to easily create complex environments locally. Next time we’ll dive into a more advanced use case showing the Everything Bagel in action!

We’re always continuing to improve it as well, making the Docker environment simpler and adding other relevant technologies.

About lakeFS

The lakeFS project is an open source technology that provides a git-like version control interface for data lakes, with seamless integration to popular data tools and frameworks.

Our mission is to maximize the manageability of open source data analytics solutions that scale.

To learn more...

Read Related Articles.

Need help getting started?

Git for Data – lakeFS

  • Get Started
    Get Started