Using CUDA Warp-Level Primitives

Accelerated Computing, Cooperative Groups, CUDA, Featured, Volta

Nadeem Mohammad, posted Jan 15 2018

NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion. Many CUDA programs achieve high performance by taking advantage of warp execution.

Read more

TensorRT 3: Faster TensorFlow Inference and Volta Support

Artificial Intelligence, Deep Learning, Inference, TensorFlow, TensorRT, Volta

Nadeem Mohammad, posted Dec 04 2017

NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning applications.

Read more

Maximizing Unified Memory Performance in CUDA

Accelerated Computing, CUDA, memory, pascal, Unified Memory, Volta

Nadeem Mohammad, posted Nov 19 2017

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible.

Read more

Microsoft Releases New Version of High-Performance, Open-Source, Deep Learning Toolkit

News, Research, Higher Education / Academia, Image Recognition, Machine Learning & Artificial Intelligence, Volta

Nadeem Mohammad, posted Jun 01 2017

Previously known as CNTK, the Microsoft Cognitive Toolkit version 2.0 allows developers to create, train, and evaluate their own neural networks that can scale across multiple GPUs and multiple machines on massive data sets.

Read more