Artificial Intelligence

Fast Multi-GPU collectives with NCCL

Today many servers contain 8 or more GPUs. In principle then, scaling an application from one to many GPUs should provide a tremendous performance boost. 10 MIN READ
Accelerated Computing

GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the… 5 MIN READ
Graphics / Simulation

Interactive Supercomputing with In-Situ Visualization on Tesla GPUs

So, you just got access to the latest supercomputer with thousands of GPUs. Obviously this is going to help you a lot with accelerating your scientific… 13 MIN READ

12 Things You Should Know about the Tesla Accelerated Computing Platform

You may already know NVIDIA Tesla as a line of GPU accelerator boards optimized for high-performance, general-purpose computing. They are used for parallel… 14 MIN READ
Accelerated Computing

Optimizing the High Performance Conjugate Gradient Benchmark on GPUs

[This post was co-written by Everett Phillips and Massimiliano Fatica.] The High Performance Conjugate Gradient Benchmark (HPCG) is a new benchmark intended to… 11 MIN READ
Accelerated Computing

Benchmarking GPUDirect RDMA on Modern Server Platforms

NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI… 13 MIN READ