[Open Source] Hamilton, a micro framework for creating dataframes, and its application at Stitch Fix

apply(meetup) - Feb '22 - 30 minutes

At Stitch Fix, we have 130+ “Full Stack Data Scientists” who, in addition to doing data science work, are also expected to engineer and own data pipelines for their production models. One data science team, the Forecasting, Estimation, and Demand team, was in a bind. Their data generation process was causing them iteration & operational frustrations in delivering time-series forecasts for the business. In this talk I’ll present Hamilton, a novel open source Python micro framework, that solved their pain points by changing their working paradigm.

Specifically, Hamilton enables a simpler paradigm for Data Science & Data Engineering teams to create, maintain, and execute code for generating dataframes, especially when there are lots of inter-column dependencies. Hamilton does this by building a DAG of dependencies directly from Python functions defined in a special manner, which also makes unit testing and documentation easy; tune into the talk to find out how. I’ll also cover our experience migrating to it, our best practices in using it in production for over two years, along with planned extensions to make it a general purpose framework.

Stefan Krawczyk

Manager Data Platform

Stitch Fix

Stefan loves the stimulus of working at the intersection of design, engineering, and data. He grew up in New Zealand, speaks Polish, and spent formative years at Stanford, LinkedIn, Nextdoor & Idibon. He currently leads the Model Lifecycle Team at Stitch Fix. Outside of work in a pre-covid world, Stefan liked to swim, eat tacos, drink beer, and travel; for the past year, he has instead ‍biked, ate tacos and baked sourdough.