Performance Optimization

Oct 14, 2025

Accelerate Qubit Research with NVIDIA cuQuantum Integrations in QuTiP and scQubits

NVIDIA cuQuantum is an SDK of libraries for accelerating quantum simulations at the circuit (digital) and device (analog) level. It is now integrated into...

5 MIN READ

Feb 27, 2025

High-Performance Remote IO With NVIDIA KvikIO

Workloads processing large amounts of data, especially those running on the cloud, will often use an object storage service (S3, Google Cloud Storage, Azure...

9 MIN READ

Decorative image of a computer monitor with icons floating around it.

Jan 30, 2025

Mastering the cudf.pandas Profiler for GPU Acceleration

In the world of Python data science, pandas has long reigned as the go-to library for intuitive data manipulation and analysis. However, as data volumes grow,...

6 MIN READ

Dec 20, 2024

Accelerating GPU Analytics Using RAPIDS and Ray

RAPIDS is a suite of open-source GPU-accelerated data science and AI libraries that are well supported for scale-out with distributed engines like Spark and...

4 MIN READ

Dec 05, 2024

Unified Virtual Memory Supercharges pandas with RAPIDS cuDF

cuDF-pandas, introduced in a previous post, is a GPU-accelerated library that accelerates pandas to deliver significant performance improvements—up to 50x...

5 MIN READ

Oct 03, 2024

Event: NVIDIA cuOpt at INFORMS 2024

Join NVIDIA cuOpt engineers at INFORMS 2024 on October 22-23 to learn how to revolutionize accelerated computing.

1 MIN READ

Sep 24, 2024

Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo

NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...

13 MIN READ

Decorative image of light fields in green, purple, and blue.

Sep 11, 2024

Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements

CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...

8 MIN READ

Aug 08, 2024

Improving GPU Performance by Reducing Instruction Cache Misses

GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...

12 MIN READ

Jul 18, 2024

Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 2, Performance Tuning

In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the...

14 MIN READ

Jul 18, 2024

Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 1, Deep Dive

In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for...

14 MIN READ

GIF of a factory floor with potential paths marked in green.

Jul 16, 2024

Building an AI Agent for Supply Chain Optimization with NVIDIA NIM and cuOpt

Enterprises face significant challenges in making supply chain decisions that maximize profits while adapting quickly to dynamic changes. Optimal supply chain...

8 MIN READ

Jul 08, 2024

Deploy Multilingual LLMs with NVIDIA NIM

Multilingual large language models (LLMs) are increasingly important for enterprises operating in today's globalized business landscape. As businesses expand...

9 MIN READ

May 10, 2024

Dynamic Control Flow in CUDA Graphs with Conditional Nodes

Post updated on February 3, 2025 with details about CUDA 12.8. CUDA Graphs can provide a significant performance increase, as the driver is able to optimize...

12 MIN READ

Picture of flowers split between VMAF-CUDA at 1424 FPS and CPU at 1034 FPS.

Mar 12, 2024

Calculating Video Quality Using NVIDIA GPUs and VMAF-CUDA

Video quality metrics are used to evaluate the fidelity of video content. They provide a consistent quantitative measurement to assess the performance of the...

14 MIN READ

Decorative image of scissors near a CPU with green light streaming out.

Feb 21, 2024

Limiting CPU Threads for Better Game Performance

Many PC games are designed around an eight-core console with an assumption that their software threading system ‘just works’ on all PCs, especially regarding...

6 MIN READ