What you’ll learn and how you can apply it
- Get an overview of the latest technologies for storing and managing your data
- Learn cutting-edge strategies for optimizing and deploying your cloud data lake
- Understand how to implement access control to maintain data privacy in your cloud data warehouse
- Find out how to utilize the lakehouse architecture to support ML and AI applications
- Discover the benefits of a data mesh approach for addressing data ownership challenges in your organization
Einat Orr: Rethinking Data Deployment—CI/CD for Data Lakes (30 minutes) – 8:55am PT | 11:55am ET | 4:55pm UTC/GMT
- At first glance, deploying data in a data lake may seem like a one-step process: you simply add the dataset to the production location in the object store. What else is there to do? It turns out that there is more you should do, and blindly writing new data introduces a host of potential problems. For example, how do you know the data you write is accurate and conforms to best practices such as format and schema? The truth is, once you’ve written data to the production location of your lake, consumers can use it. In a sense, it’s already too late. Einat Orr presents a new strategy for data deployment, one where new data can be added in isolation, then tested and validated, before “going live” in a production table. She’ll also demonstrate how data versioning tools like lakeFS and Project Nessie can support this deployment method in a seamless way with zero copy operations.
- Einat Orr is the cofounder and CEO of Treeverse, the company behind lakeFS, an open source platform that delivers a Git-like experience to object-storage based data lakes. Einat previously led several engineering organizations, most recently as the CTO of SimilarWeb. She holds a PhD in mathematics in the field of optimization in graph theory from Tel Aviv University.