NVIDIA SDK Updated With New Releases of TensorRT, CUDA, and More

At NIPS 2017, NVIDIA announced new software releases for deep learning and HPC developers. The latest SDK updates include new capabilities and performance optimizations to TensorRT, CUDA toolkit and the new project CUTLASS library.
Here’s a detailed look at each of the software updates and the benefits they bring to developers and end users:

TensorRT 3

TensorRT 3 production release is now available as a free download to all members of the NVIDIA Developer Program. Highlights from this release include:

Deliver up to 3.7x faster inference on Tesla V100 vs. Tesla P100 under 7ms real-time latency
Optimize and deploy TensorFlow models up to 18x faster compared to TensorFlow framework inference on Tesla V100
Improved productivity with easy-to-use Python API

Technical blog:

TensorRT 3: Faster TensorFlow Inference and Volta Support

TensorRT Performance whitepaper:

GPU Inference Performance Study

Learn more and download TensorRT >>

TensorRT Container on NVIDIA GPU Cloud (NGC)

New TensorRT inference container on NGC with the latest TensorRT 3 release, sample REST server for cloud inference, and sample Open Neural Network Exchange (ONNX) model parser.
Sign up for an NGC account to get free access to the TensorRT container for your desktop with a TITAN GPU or for NVIDIA Volta-enabled P3 instances on Amazon EC2.
Technical blog:

RESTful Inference with the TensorRT Container and NVIDIA GPU Cloud

CUTLASS

CUDA Templates for Linear Algebra Subroutines or CUTLASS is a CUDA C++ template library that offers a high-level interface and building blocks for implementing fast and efficient GEMM (GEneral Matrix Multiplication) operations for HPC and deep learning applications. CUTLASS is available as an open source project on GitHub. It remains under development and is open sourced for feedback and testing, and is not ready for use in production.
Technical blog:

CUTLASS: Programming Fast Linear Algebra Kernels in CUDA C++

CUDA 9.1

Available later this month, CUDA 9.1 will bring new algorithms and optimizations that speed up AI and HPC apps on Volta GPUs. Highlights include:

Develop image augmentation algorithms for deep learning easily with new functions in NVIDIA Performance Primitives
Run batched neural machine translations and sequence modeling operations on Volta Tensor cores using new APIs in cuBLAS
Solve large 2D and 3D FFT problems more efficiently on multi-GPU systems with new heuristics in cuFFT
Launch CUDA kernels up to 12x faster with new performance optimizations

Register for the NVIDIA Developer Program to be notified when CUDA 9.1 is available for download >>

JetPack 3.2

JetPack 3.2 Developer Preview is now available. Through our update to TensorRT 3.0, we’re now adding support for TensorFlow models. This delivers up to 15% performance per Watt improvements for deep learning applications. In addition, the new L4T kernel supports Docker, while JetPack now enables Ubuntu 16.04 on your host PC.
Download JetPack 3.2 >>