After clicking “Watch Now” you will be prompted to login or join.
Self-Supervised Viewpoint Learning from Image Collections
Shalini De Mello, NVIDIA | Siva Mustikovela, University of Heidelberg
GTC 2020
Learning-based methods for viewpoint estimation of object categories (for example, faces or cars) require many images with labeled viewpoints. Viewpoint annotations are cumbersome to acquire and often contain errors. On the other hand, it is relatively easy to mine large collections of unlabelled images of a category from the internet. We investigate whether such image collections can be used to successfully train viewpoint-estimation networks purely via self-supervision, where the only ground-truth label available is the image itself. We design a framework that leverages the analysis-by-synthesis paradigm and couples the viewpoint network with a viewpoint-aware synthesis network to supervise it. We additionally propose various losses that enforce symmetry, realism, and better disengagement of the latent space of the image synthesizer to further supervise the viewpoint network. For faces, cars, buses, and trains, our technique performs competitively to the existing fully-supervised approaches.