NVIDIA’s GPU Technology Conference (GTC) is the premier AI conference, offering hundreds of workshops, sessions, and keynotes hosted by organizations like Google, Amazon, Facebook as well as rising startups.
GTC showcases the latest breakthroughs in AI training and inference, industry-changing technologies, and successful implementations from research to production.
In the video below, see the the top 5 deep learning sessions you can attend at the conference.
5 – TensorRT Inference with Tensorflow
We’ll explain how to use TensorRT via TensorFlow and/or TensorFlow serving. TensorFlow is a flexible, high-performance software library for numerical computation using data flow graphs and NVIDIA TensorRT is a platform for high-performance deep learning inference. We’ll describe how TensorRT is integrated with TensorFlow and show how combining the two improves efficiency of machine learning models. We’ll also use examples to show how to use the integration.
4 – Cloud Native ML with Kubeflow and TensorRT
Building machine learning pipelines is challenging. Doing that in a portable way that supports multi-cloud deployments is even harder. We’ll discuss the open source project, Kubeflow, which is designed to allow data scientists and machine learning engineers to focus on building great ML solutions instead of setting up and managing the infrastructure. We’ll detail the latest version of Kubeflow and its integration with TensorRT, the inference server from NVIDIA.
3 – Training ImageNet in Four Minutes
We’ll discuss how we build a highly scalable deep learning training system and training ImageNet in four minutes. For dense GPU clusters we optimize the training system by proposing a mixed-precision training method that significantly improves training throughput of a single GPU without losing accuracy. We also propose an optimization approach for extremely large mini-batch size (up to 64k) that can train CNN models on ImageNet dataset without losing accuracy. And we propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs. Our training system can achieve 75.8% top-1 test accuracy in only 6.6 minutes using 2048 Tesla P40 GPUs. When training AlexNet with 95 epochs, our system can achieve 58.7% top-1 test accuracy within 4 minutes using 1024 Tesla P40 GPUs,which also outperforms all other existing systems.
2 – Inference at Reduced Precision on GPUs
Although neural network training is typically done in either 32- or 16-bit floating point formats, inference can be run at even lower precisions that reduce memory footprint and elapsed time. We’ll describe quantizing neural networks models for various image (classification, detection, segmentation) and natural language processing tasks. In addition to convolutional feed forward networks, we will cover quantization of recurrent models. The discussion will examine both floating point and integer quantizations, targeting features in Volta and Turing GPUs.
1 – Training AI Models Faster With Distributed Training in PyTorch
With deep learning being largely invariant to operator precision, there is potential for significant gains in performance and memory usage when training deep learning models. Learn more about how you can take advantage of mixed precision training in PyTorch to realize performance gains.