Profiling

A graphic of a computer sending code to multiple stacks.

Jun 28, 2022

Advanced API Performance: SetStablePowerState

This post covers best practices for using SetStablePowerState on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API...

2 MIN READ

Jan 27, 2022

Advanced Kernel Profiling with the Latest Nsight Compute

NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging through a user...

4 MIN READ

Jul 16, 2019

TensorFlow Performance Logging Plugin nvtx-plugins-tf Goes Public

The new nvtx-plugins-tf library enables users to add performance logging nodes to TensorFlow graphs. (TensorFlow is an open source library widely used for...

7 MIN READ

Apr 23, 2019

NVIDIA Nsight Systems Adds Vulkan Support

Vulkan is a low-overhead, cross-platform 3D graphics and compute API targeting a wide variety of devices from cloud gaming servers, to PCs and embedded...

7 MIN READ

May 30, 2018

Nsight Systems Exposes New GPU Optimization Opportunities

As GPU performance steadily ramps up, your application may be overdue for a tune-up to keep pace. Developers have used independent CPU profilers and GPU...

6 MIN READ

Apr 05, 2016

CUDA 8 Features Revealed

Today I'm excited to announce the general availability of CUDA 8, the latest update to NVIDIA's powerful parallel computing platform and programming model. In...

19 MIN READ

Sep 29, 2015

Customize CUDA Fortran Profiling with NVTX

The NVIDIA Tools Extension (NVTX) library lets developers annotate custom events and ranges within the profiling timelines generated using tools such as the...

5 MIN READ

Sep 08, 2015

CUDA 7.5: Pinpoint Performance Problems with Instruction-Level Profiling

[Note: Thejaswi Rao also contributed to the code optimizations shown in this post.] Today NVIDIA released CUDA 7.5, the latest release of the powerful CUDA...

12 MIN READ

Jul 08, 2015

New Features in CUDA 7.5

Today I'm happy to announce that the CUDA Toolkit 7.5 Release Candidate is now available. The CUDA Toolkit 7.5 adds support for FP16 storage for up to 2x larger...

12 MIN READ

May 05, 2015

GPU Pro Tip: Track MPI Calls In The NVIDIA Visual Profiler

Often when profiling GPU-accelerated applications that run on clusters, one needs to visualize MPI (Message Passing Interface) calls on the GPU timeline in the...

5 MIN READ

Jan 22, 2015

GPU Pro Tip: CUDA 7 Streams Simplify Concurrency

Heterogeneous computing is about efficiently using all processors in the system, including CPUs and GPUs. To do this, applications must execute functions...

8 MIN READ

Aug 25, 2014

Remote Application Development using NVIDIA Nsight Eclipse Edition

NVIDIA Nsight Eclipse Edition (NSEE) is a full-featured unified CPU+GPU integrated development environment(IDE) that lets you easily develop CUDA applications...

13 MIN READ

Aug 04, 2014

Accelerate R Applications with CUDA

R is a free software environment for statistical computing and graphics that provides a programming language and built-in libraries of mathematics operations...

15 MIN READ

Jun 19, 2014

CUDA Pro Tip: Profiling MPI Applications

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it's necessary to identify the MPI rank where...

4 MIN READ

Jun 03, 2014

Accelerating a C++ CFD Code with OpenACC

Computational Fluid Dynamics (CFD) is a valuable tool to study the behavior of fluids. Today, many areas of engineering use CFD. For example, the automotive...

12 MIN READ

May 27, 2014

NVIDIA Nsight Eclipse Edition for Jetson TK1

NVIDIA Nsight Eclipse Edition is a full-featured, integrated development environment that lets you easily develop CUDA applications for either your local (x86)...

12 MIN READ