Towards a Unified Real-Time ML Data Pipeline, from Training to Serving

apply()2021 - 30 minutes

On a global marketplace like Etsy where buyers come to buy unique, varied items from sellers from around the globe, the inventory of items is constantly changing. User preferences also change in real time as they discover the latest selection being offered on the site. In such a dynamic environment, Machine Learning models for different applications (including search, recommendations or computational advertisement), need to collect different real time data signals, process them and finally leverage them to make the most relevant predictions.

In this talk we will detail how we use real-time feature logging to capture in-session / trending activities, build a typed unified feature store for sharing features across models from different domains and serve feature data at scale with the eventual goal of powering reactive systems. We will finally show how such a real-time ML data pipeline can be leveraged to build Reactive systems (Bandits / Reinforcement / Online learning applications) that can use state of the art ML algorithms to learn from user actions in real time.


Aakash Sabharwal

Senior Engineering Manager

Etsy

Aakash is a Senior Engineering Manager in Etsy’s Data Science and Machine Learning group, leading the Machine Learning Systems group. His teams focus on building scalable & efficient realtime systems that allow Etsy to leverage its vast quantities of marketplace data for different search, advertisement and recommendation applications.

Aakash has been involved with different startup companies since the start of his career including Ooyala (acquired by Telstra), Platfora (acquired by Workday), Quantifind & finally Blackbird, which was acquired by Etsy. At all these companies his work has been at the intersection of Data Science, Machine Learning & Distributed Systems. Aakash holds a degree in Computer Science from Carnegie Mellon.

Sheila Hu

Senior Machine Learning Engineer

Etsy

Sheila is a Senior Machine Learning Engineer at the Machine Learning Systems group, and the tech lead of the machine learning pipeline squad. She is extensively involved in designing and building the data ecosystem that aims to formalize and expedite Etsy’s machine learning development. Over the last two years, she has worked on building feature stores and data pipelines at Etsy. Before joining Etsy, Sheila did her graduate studies in Operations Research at Columbia University.