The Only Truly Hard Problem in MLOps

apply()2021 - 30 minutes

MLOps solutions are often presented as addressing particularly challenging problems. This is mostly untrue. The majority of the problems solved by MLOps solutions have their origins in pre-ML data processing systems and are well addressed by the solutions that we devised to those problems. Data ingestion, feature storage, model serving and even model management and training are all relatively well addressed by traditional data processing approaches. And all of these can be solved with little or no understanding of the modeling challenge your system is designed to solve.

The only truly ML-centric, hard problem in MLOps is in the data. There is no automated, generalized way to mitigate the impact of subtle changes in the distribution of training data or of undetected changes in the semantics of that data. A general solution to this problem would unlock more usability and trust in ML than any other improvement we can make.

Todd Underwood



Todd Underwood is a Director at Google. He leads Machine Learning Site Reliability Engineering (ML SRE) for Google. He is also Site Lead for Google’s Pittsburgh engineering office. ML SRE teams build and scale internal and external AI/ML services and are critical to almost every Product Area at Google.

Before working at Google, Todd held a variety of roles at Renesys (since acquired by Oracle). He led operations, security, and peering for Renesyss Internet intelligence services that are now part of Oracle’s Cloud service. He also did product management for some early social products that Renesys worked on. Before that Todd was Chief Technology Officer of Oso Grande, an independent Internet service provider (AS2901) in New Mexico.