Programmatic Supervision for Software 2.0

apply(conf) - Apr '21 - 10 minutes

One of major bottlenecks in the development and deployment of AI applications is the need for the massive labeled training datasets that drive modern ML approaches today. These training datasets traditionally are often labeled by hand at great time and monetary expense, and often cannot be hand-labeled practically at all due to privacy, expertise, and or rate-of-change requirements in real world settings like healthcare and more.

This talk will cover a range of *programmatic* (often called “weak supervision”) approaches to building, labeling, augmenting, and structuring training datasets, as well as the broader effects on end-to-end ML and AI application development. Specifically, this talk will cover techniques around programmatic labeling- such as the data programming and Snorkel approaches; data augmentation techniques for augmenting datasets with transformed copies of data to increase model robustness; data structuring or “slicing” techniques for highlighting, monitoring, and enabling models to attend to critical and/or difficult subsets of the data; and more key techniques around training data management.

More broadly, this talk will address how these new programmatic approaches lead to a whole new end-to-end ML/AI application development process. Using the example of Snorkel Flow, a new platform for this process, I will cover these ideas and how they extend to model training, monitoring and analysis, and the feedback loops that lead to actionable modification or extension of the programmatic supervision approaches, leading more broadly to a more iterative and error analysis-driven development and deployment process for ML and AI applications overall.

Alex Ratner

Co-Founder & CEO

Snorkel AI

Alex Ratner is the co-founder and CEO at Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. Prior to Snorkel AI and UW, he completed his Ph.D. in CS advised by Christopher Ré at Stanford, where he started and led the Snorkel open source project, and where his research focused on applying data management and statistical learning techniques to emerging machine learning workflows such as creating and managing training data and applying this to real-world problems in medicine, knowledge base construction, and more. Previously, he earned his A.B. in Physics from Harvard University.