One of major bottlenecks in the development and deployment of AI applications is the need for the massive labeled training datasets that drive modern ML approaches today. These training datasets traditionally are often labeled by hand at great time and monetary expense, and often cannot be hand-labeled practically at all due to privacy, expertise, and or rate-of-change requirements in real world settings like healthcare and more.
This talk will cover a range of *programmatic* (often called “weak supervision”) approaches to building, labeling, augmenting, and structuring training datasets, as well as the broader effects on end-to-end ML and AI application development. Specifically, this talk will cover techniques around programmatic labeling- such as the data programming and Snorkel approaches; data augmentation techniques for augmenting datasets with transformed copies of data to increase model robustness; data structuring or “slicing” techniques for highlighting, monitoring, and enabling models to attend to critical and/or difficult subsets of the data; and more key techniques around training data management.
More broadly, this talk will address how these new programmatic approaches lead to a whole new end-to-end ML/AI application development process. Using the example of Snorkel Flow, a new platform for this process, I will cover these ideas and how they extend to model training, monitoring and analysis, and the feedback loops that lead to actionable modification or extension of the programmatic supervision approaches, leading more broadly to a more iterative and error analysis-driven development and deployment process for ML and AI applications overall.