The Only Truly Hard Problem in MLOps

April 22, 9:10 am - 9:40 am (30 Minutes)

Add to Calendar

MLOps solutions are often presented as addressing particularly challenging problems. This is mostly untrue. The majority of the problems solved by MLOps solutions have their origins in pre-ML data processing systems and are well addressed by the solutions that we devised to those problems. Data ingestion, feature storage, model serving and even model management and training are all relatively well addressed by traditional data processing approaches. And all of these can be solved with little or no understanding of the modeling challenge your system is designed to solve.

The only truly ML-centric, hard problem in MLOps is in the data. There is no automated, generalized way to mitigate the impact of subtle changes in the distribution of training data or of undetected changes in the semantics of that data. A general solution to this problem would unlock more usability and trust in ML than any other improvement we can make.

Niall Murphy

Global Head of Azure SRE


Niall Richard Murphy has worked in Internet infrastructure since the mid-1990s. His books have sold approximately a quarter of a million copies world-wide, most notably the award-winning Site Reliability Engineering (2016), and he is probably one of the few people in the world to hold degrees in Computer Science, Mathematics, and Poetry Studies. He lives in Dublin, Ireland, with his wife and two special needs children. Currently he is a Director of Site Reliability Engineering in Microsoft Azure.

Todd Underwood



Todd Underwood is a Director at Google. He leads Machine Learning Site Reliability Engineering (ML SRE) for Google. He is also Site Lead for Google’s Pittsburgh engineering office. ML SRE teams build and scale internal and external AI/ML services and are critical to almost every Product Area at Google.

Before working at Google, Todd held a variety of roles at Renesys (since acquired by Oracle). He led operations, security, and peering for Renesyss Internet intelligence services that are now part of Oracle’s Cloud service. He also did product management for some early social products that Renesys worked on. Before that Todd was Chief Technology Officer of Oso Grande, an independent Internet service provider (AS2901) in New Mexico.