Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9686:Semi-supervised deep learning applications

Bryan Catanzaro(NVIDIA)
In this talk, I'll discuss several semi-supervised learning applications from our recent work in applied deep learning research at NVIDIA. I'll first discuss video translation, which renders new scenes using models learned from real-world videos. We take real world videos, analyze them using existing computer vision techniques such as pose estimation or semantic segmentation, and then train generative models to invert these poses or segmentations back to videos. In deployment, we then render novel sketches using these models. I'll then discuss work on large-scale language modeling, where a model trained to predict text, piece by piece, on a large dataset is then finetuned with small amounts of labeled data to solve problems like emotion classification. Finally, I'll discuss WaveGlow, our flow-based generative model for the vocoder stage of speech synthesis, that combines a simple log-likelihood based training procedure with very fast and efficient inference. Because semi-supervised learning allows us to try tackling problems where large amounts of labels would be prohibitively expensive to create, it opens the scope of problems to which we can apply machine learning.

View the slides (pdf)