Training Large-Scale Recommendation Models with TPUs

apply(conf) - May '22 - 10 minutes

At Snap, we train a large number of deep learning models every day to continuously improve the ad recommendation quality to Snapchatters and provide more value to the advertisers. These ad ranking models have hundreds of millions of parameters and are trained on billions of examples. Training an ad ranking model is a computation-intensive and memory-lookup-heavy task. It requires a state-of-the-art distributed system and performant hardware to complete the training reliably and in a timely manner. This session will describe how we leveraged Google’s Tensor Processing Units (TPU) for fast and efficient training.

Aymeric Damien

Machine Learning Engineer

Snap Inc.

Aymeric is a ML Engineer at Snap, leading various efforts to optimize Snap’s Ad-Ranking ML systems. His work includes training & inference pipeline optimization, modelling efficiency, and collaboration with Google/Nvidia/Intel for ML hardware optimization.