GTC-DC 2019: Tackling Data Scarcity and Bias in Deep Learning

Anima Anandkumar, NVIDIA

gtc-dc 2019

We’ll explain how to alleviate the difficulty of obtaining large labeled datasets that are often required for training in modern deep learning. Our methods reduce data requirements and quantify the bias in datasets. Techniques like active learning, crowdsourcing, and semi-supervised and structured learning significantly reduce sample complexity. We’ll also discuss the current method of detecting semantically ambiguous and error-prone examples for classification by humans. We’ll show the poor correlation between model confidence and human visual hardness, and propose a new score with strong correlation. These techniques will improve the deployment of deep learning in real-world applications.