Algorithms / Numerical Techniques

Jan 21, 2026

Streamlining CUB with a Single-Call API

The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional "two-phase" API, which separates memory estimation...

8 MIN READ

Jul 09, 2025

Reinforcement Learning with NVIDIA NeMo-RL: Reproducing a DeepScaleR Recipe Using GRPO

Reinforcement learning (RL) is the backbone of interactive AI. It is fundamental for teaching agents to reason and learn from human preferences, enabling...

5 MIN READ

Mar 04, 2025

GPU-Accelerate Algorithmic Trading Simulations by over 100x with Numba

Quantitative developers need to run back-testing simulations to see how financial algorithms perform from a profit and loss (P&L) standpoint. Statistical...

12 MIN READ

Jul 29, 2024

Building Spatial Intelligence from Real-World 3D Data Using Deep-Learning Framework fVDB

Generative physical AI models can understand and execute actions with fine or gross motor skills within the physical world. Understanding and navigating in the...

6 MIN READ

Jul 18, 2024

Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 2, Performance Tuning

In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the...

14 MIN READ

Jul 18, 2024

Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 1, Deep Dive

In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for...

14 MIN READ

Two b&w images of a woman in a hat, one image in a higher resolution.

Jul 05, 2024

Explainer: What Is K-Means?

K-means is a clustering algorithm—one of the simplest and most popular unsupervised machine learning (ML) algorithms for data scientists.

1 MIN READ

May 03, 2024

Visual Language Models on NVIDIA Hardware with VILA

Note: As of January 6, 2025 VILA is now part of the new Cosmos Nemotron vision language models. Visual language models have evolved significantly recently....

11 MIN READ

Mar 14, 2024

Applying Mixture of Experts in LLM Architectures

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models...

12 MIN READ

Decorative image of matrices on a black background, with the text, "Part 2."

Mar 08, 2024

cuTENSOR 2.0: Applications and Performance

While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically...

9 MIN READ

Decorative image of matrices on a black background, with the text "Part 1."

Mar 08, 2024

cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations

NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array...

17 MIN READ

Oct 10, 2023

Event: AI and Data Science Virtual Summit

Meta, NetworkX, Fast.ai, and other industry leaders share how to gain new insights from your data with emerging tools.

1 MIN READ

Oct 02, 2023

Accelerated Vector Search: Approximating with NVIDIA cuVS Inverted Index

Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn’t scale particularly well to...

15 MIN READ

Sep 11, 2023

Accelerating Vector Search: Fine-Tuning GPU Index Algorithms

In this post, we dive deeper into each of the GPU-accelerated indexes mentioned in part 1 and give a brief explanation of how the algorithms work, along with a...

12 MIN READ

Sep 11, 2023

Accelerating Vector Search: Using GPU-Powered Indexes with NVIDIA cuVS

In the current AI landscape, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic...

11 MIN READ

An illustration with 3 different colored squares labeled GPUs in a row.

Aug 04, 2023

ICYMI: Unlocking the Power of GPU-Accelerated DataFrames in Python

Read this tutorial on how to tap into GPUs by importing cuDF instead of pandas–with only a few code changes.

1 MIN READ