TensorRT 3: Faster TensorFlow Inference and Volta Support

Features, Deep Learning, Inference, TensorFlow, TensorRT, Volta

Nadeem Mohammad, posted Dec 04 2017

NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning applications.

Read more

Maximizing Unified Memory Performance in CUDA

Features, CUDA, memory, pascal, Unified Memory, Volta

Nadeem Mohammad, posted Nov 19 2017

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible.

Read more

Programming Tensor Cores in CUDA 9

Features, Deep Learning, Linear Algebra, Mixed Precision, Tensor Cores, Volta

Nadeem Mohammad, posted Oct 17 2017

A defining feature of the new Volta GPU Architecture is its Tensor Cores, which give the Tesla V100 accelerator a peak throughput 12 times the 32-bit floating point throughput of the previous-generation Tesla P100.

Read more

Microsoft Releases New Version of High-Performance, Open-Source, Deep Learning Toolkit

News, Research, Higher Education / Academia, Image Recognition, Machine Learning & Artificial Intelligence, Volta

Nadeem Mohammad, posted Jun 01 2017

Previously known as CNTK, the Microsoft Cognitive Toolkit version 2.0 allows developers to create, train, and evaluate their own neural networks that can scale across multiple GPUs and multiple machines on massive data sets.

Read more