We’re getting the group back together in San Francisco! The Data Council Community is meeting up to discuss the latest throughout data infrastructure.
This meeting will be hosted from 6 – 8 PM at the San Francisco Mindspace HQ – just a 2 minute walk from the Montgomery Bart station. Each talk will be about 20 minutes with time for discussion and questions. Come see your old friends and make some new ones. Food and drinks, on us!
Talk details below
Talk #1: Develop Spark ETL pipelines with no risk against production data
Abstract: Delivering high-quality data products requires strict testing of pipelines before deploying those into production. Today, to test using quality data, one either needs to use a subset of the data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly and time consuming.
We will demonstrate how to get the entire production data set with zero-copy. You will learn:
- Set up your environment in under 5 minutes
- Create multiple isolated testing environments without copying data
- Easily run multiple tests on your environment using git-like operations (such as commit, branch, revert, etc.)
Speaker: Vinodhini SD, Developer Advocate @ lakeFS | Profile
Vinodhini is a developer advocate at Treeverse – the creators of open source project lakeFS. She is a data practitioner with 6+ years of experience working for companies like Apple, Nike and NetApp, and has a blend of data engineering, machine learning and business analytics expertise.
Talk #2: Hub and Spoke for Machine Learning Results
Abstract: It’s no secret that determining MLOps best practices is often one of the biggest pain points for Data teams. To deliver true value from ML solutions, data teams have to be able to not just develop these solutions, but also scale them between teams and systems. The truth is that brittle, bespoke Python code a the end of an ML model just won’t cut it (as many of you already know too well). In this talk, Donny Flynn, customer data architect at reverse ETL pioneer Census, will break down what data hub-centric MLOps entails, some of the horror stories that can derail you, and how to orchestrate your pipelines to best help the operational business users.
Speaker: Donny Flynn, Customer Data Architect @ Census | Profile
Donny is a Customer Data Architect at Census primarily responsible for working with prospects and Census customers who are exploring and implementing Operational Analytics within their organizations. He previously was a Head of Data at both Owner.com and Chiper, starting from a data science role and moving towards the analytics side of the data house. Donny is a fun karaoke participant and a flat-out fanatic for Chicago sports: Blackhawks, Cubs, Bears, and Bulls.
Please RSVP so we we have your info for check-in.
See you there!