Profiling
Jun 28, 2022
Advanced API Performance: SetStablePowerState
This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...
2 MIN READ
Jan 27, 2022
Advanced Kernel Profiling with the Latest Nsight Compute
NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging through a user...
4 MIN READ
Jul 16, 2019
TensorFlow Performance Logging Plugin nvtx-plugins-tf Goes Public
The new nvtx-plugins-tf library enables users to add performance logging nodes to TensorFlow graphs. (TensorFlow is an open source library widely used for...
7 MIN READ
Apr 23, 2019
NVIDIA Nsight Systems Adds Vulkan Support
Vulkan is a low-overhead, cross-platform 3D graphics and compute API targeting a wide variety of devices from cloud gaming servers, to PCs and embedded...
7 MIN READ
May 30, 2018
Nsight Systems Exposes New GPU Optimization Opportunities
As GPU performance steadily ramps up, your application may be overdue for a tune-up to keep pace. Developers have used independent CPU profilers and GPU...
6 MIN READ
Apr 05, 2016
CUDA 8 Features Revealed
Today I'm excited to announce the general availability of CUDA 8, the latest update to NVIDIA's powerful parallel computing platform and programming model. In...
19 MIN READ
Sep 29, 2015
Customize CUDA Fortran Profiling with NVTX
The NVIDIA Tools Extension (NVTX) library lets developers annotate custom events and ranges within the profiling timelines generated using tools such as the...
5 MIN READ
Sep 08, 2015
CUDA 7.5: Pinpoint Performance Problems with Instruction-Level Profiling
[Note: Thejaswi Rao also contributed to the code optimizations shown in this post.] Today NVIDIA released CUDA 7.5, the latest release of the powerful CUDA...
12 MIN READ
Jul 08, 2015
New Features in CUDA 7.5
Today I'm happy to announce that the CUDA Toolkit 7.5 Release Candidate is now available. The CUDA Toolkit 7.5 adds support for FP16 storage for up to 2x larger...
12 MIN READ
May 05, 2015
GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler
Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the...
5 MIN READ
Feb 23, 2015
Learn GPU Computing with Hands-On Labs at GTC 2015
Every year NVIDIA’s GPU Technology Conference (GTC) gets bigger and better. One of the aims of GTC is to give developers, scientists, and practitioners...
4 MIN READ
Jan 22, 2015
GPU Pro Tip: CUDA 7 Streams Simplify Concurrency
Heterogeneous computing is about efficiently using all processors in the system, including CPUs and GPUs. To do this, applications must execute functions...
8 MIN READ
Aug 25, 2014
Remote Application Development using NVIDIA Nsight Eclipse Edition
NVIDIA Nsight Eclipse Edition (NSEE) is a full-featured unified CPU+GPU integrated development environment(IDE) that lets you easily develop CUDA applications...
13 MIN READ
Aug 04, 2014
Accelerate R Applications with CUDA
R is a free software environment for statistical computing and graphics that provides a programming language and built-in libraries of mathematics operations...
15 MIN READ
Jun 19, 2014
CUDA Pro Tip: Profiling MPI Applications
When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it's necessary to identify the MPI rank where...
4 MIN READ
Jun 03, 2014
Accelerating a C++ CFD Code with OpenACC
Computational Fluid Dynamics (CFD) is a valuable tool to study the behavior of fluids. Today, many areas of engineering use CFD. For example, the automotive...
12 MIN READ