Performance Optimization

Jan 30, 2025
Mastering the cudf.pandas Profiler for GPU Acceleration
In the world of Python data science, pandas has long reigned as the go-to library for intuitive data manipulation and analysis. However, as data volumes grow,...
6 MIN READ

Dec 20, 2024
Accelerating GPU Analytics Using RAPIDS and Ray
RAPIDS is a suite of open-source GPU-accelerated data science and AI libraries that are well supported for scale-out with distributed engines like Spark and...
4 MIN READ

Dec 05, 2024
Unified Virtual Memory Supercharges pandas with RAPIDS cuDF
cuDF-pandas, introduced in a previous post, is a GPU-accelerated library that accelerates pandas to deliver significant performance improvements—up to 50x...
5 MIN READ

Oct 03, 2024
Event: NVIDIA cuOpt at INFORMS 2024
Join NVIDIA cuOpt engineers at INFORMS 2024 on October 22-23 to learn how to revolutionize accelerated computing.
1 MIN READ

Sep 24, 2024
Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo
NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...
13 MIN READ

Sep 11, 2024
Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements
CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...
8 MIN READ

Aug 08, 2024
Improving GPU Performance by Reducing Instruction Cache Misses
GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...
12 MIN READ

Jul 18, 2024
Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 2, Performance Tuning
In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the...
14 MIN READ

Jul 18, 2024
Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 1, Deep Dive
In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for...
14 MIN READ

Jul 16, 2024
Building an AI Agent for Supply Chain Optimization with NVIDIA NIM and cuOpt
Enterprises face significant challenges in making supply chain decisions that maximize profits while adapting quickly to dynamic changes. Optimal supply chain...
8 MIN READ

Jul 08, 2024
Deploy Multilingual LLMs with NVIDIA NIM
Multilingual large language models (LLMs) are increasingly important for enterprises operating in today's globalized business landscape. As businesses expand...
9 MIN READ

May 10, 2024
Dynamic Control Flow in CUDA Graphs with Conditional Nodes
Post updated on February 3, 2025 with details about CUDA 12.8. CUDA Graphs can provide a significant performance increase, as the driver is able to optimize...
12 MIN READ

Mar 12, 2024
Calculating Video Quality Using NVIDIA GPUs and VMAF-CUDA
Video quality metrics are used to evaluate the fidelity of video content. They provide a consistent quantitative measurement to assess the performance of the...
14 MIN READ

Feb 21, 2024
Limiting CPU Threads for Better Game Performance
Many PC games are designed around an eight-core console with an assumption that their software threading system ‘just works’ on all PCs, especially...
6 MIN READ

Jan 16, 2024
Robust Scene Text Detection and Recognition: Inference Optimization
In this post, we delve deeper into the inference optimization process to improve the performance and efficiency of our machine learning models during the...
9 MIN READ

Jan 16, 2024
Robust Scene Text Detection and Recognition: Implementation
To make scene text detection and recognition work on irregular text or for specific use cases, you must have full control of your model so that you can do...
6 MIN READ