Facebook Self-Supervised AI Outperforms State-of-the-Art Computer Vision Models

Facebook AI researchers this week announced SEER, a self-supervised model that surpasses the best self-supervised systems, and also outperforms supervised models on tasks including image classification, object detection, and segmentation.

Combining RegNet architectures with the SwAV online clustering approach, SEER is a billion-parameter model pretrained on a billion random images.

Instead of relying on labeled datasets, self-supervised learning models for computer vision generate data labels by finding relationships between images with no annotations or metadata. Such models are considered key to developing AI with “common sense,” says Yann LeCun, Facebook AI’s chief scientist.

After using a billion public Instagram images for pretraining, SEER achieved 84.2 percent accuracy on the popular ImageNet dataset, beating state-of-the-art self-supervised systems. The researchers also trained SEER using just 10 percent of images in the popular ImageNet dataset, still achieving nearly 78 percent accuracy. Even when trained with just 1 percent of ImageNet, the model was over 60 percent accurate.

SEER was trained on 512 NVIDIA V100 Tensor Core GPUs with 32GB of RAM for 30 days, said Facebook software engineer Priya Goyal. The researchers used mixed precision from the NVIDIA Apex library and gradient checkpointing tools from PyTorch to reduce memory usage and increase training speed of the model.

The researchers chose RegNet architecture for its ability to scale to billions or trillions of parameters while accommodating runtime and memory constraints. The SwAV algorithm helped achieve record performance with 6x less training time.

“Eliminating the need for human annotations and metadata enables the computer vision community to work with larger and more diverse data sets, learn from random public images, and potentially mitigate some of the biases that come into play with data curation,” wrote Facebook AI in a blog post. “Self-supervised learning can also help specialize models in domains where we have limited images or metadata, like medical imaging.”

Facebook also open-sourced VISSL, the PyTorch-based general-purpose library for self-supervised learning that was used to develop SEER.