Third Generation Production ML Architectures: Lessons from History, Experiences with Ray

apply(conf) - Apr '21 - 30 minutes

Production ML architectures (deployed at scale in production) are evolving at a rapid pace. We suggest there have been two generations so far: the first generation were very much fixed function pipelines with predetermined stages, the second generation was pluggable components with a bit more flexibility but still pretty constrained. If history is a guide (especially looking at the evolution of GPU APIs), the third generation is going to come from making the computational power accessible and flexible.

We share our experiences with Ray, a system that makes distributed computing accessible and flexible. We give a two slide introduction to Ray, and show how Ray’s flexibility enables approaches like online reinforcement learning that are not easy to fit in to existing production ML architectures without some serious shoe-horning.

We then outline how different companies (such as Uber, Ant Financial, McKinsey) are using Ray in a way that extends beyond the constraints of existing second generation architectures.

Dr. Waleed Kadous

Head of Engineering


Dr. Waleed Kadous leads engineering at Anyscale, the company behind the open source project Ray. Prior to Anyscale, Waleed worked at Uber, where he led overall system architecture, evangelized machine learning, and led the Location and Maps teams. He previously worked at Google, where he founded the Android Location and Sensing team, responsible for the “blue dot” as well as ML algorithms underlying products like Google Fit.