Deep dive

Aug 18, 2025
Identify Speakers in Meetings, Calls, and Voice Apps in Real-Time with NVIDIA Streaming Sortformer
In every meeting, call, crowded room, or voice-enabled app, technology has a core question: who is speaking, and when? For decades, answering that question in...
5 MIN READ

Aug 13, 2025
Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2
Currently, one of the most compelling questions in AI is whether large language models (LLMs) can continue to improve through sustained reinforcement learning...
8 MIN READ

Aug 07, 2025
Efficient Transforms in cuDF Using JIT Compilation
RAPIDS cuDF offers a broad set of ETL algorithms for processing data with GPUs. For pandas users, cuDF accelerated algorithms are available with the zero code...
9 MIN READ

Aug 07, 2025
Train with Terabyte-Scale Datasets on a Single NVIDIA Grace Hopper Superchip Using XGBoost 3.0
Gradient-boosted decision trees (GBDTs) power everything from real-time fraud filters to petabyte-scale demand forecasts. XGBoost open source library has long...
7 MIN READ

Aug 07, 2025
How Hackers Exploit AI's Problem-Solving Instincts
As multimodal AI models advance from perception to reasoning, and even start acting autonomously, new attack surfaces emerge. These threats don’t just target...
10 MIN READ

Aug 05, 2025
NVIDIA vGPU 19.0 Enables Graphics and AI Virtualization on NVIDIA Blackwell GPUs
Virtualization has long promised efficiency and scalability. However, challenges persist due to the increasing demands of graphics and compute workloads, along...
6 MIN READ

Aug 04, 2025
NVIDIA CUDA-Q 0.12 Expands Toolset for Developing Hardware-Performant Quantum Applications
NVIDIA CUDA-Q 0.12 introduces new simulation tools for accelerating how researchers develop quantum applications and design performant quantum hardware. With...
7 MIN READ

Jul 31, 2025
Securing Agentic AI: How Semantic Prompt Injections Bypass AI Guardrails
Prompt injection, where adversaries manipulate inputs to make large language models behave in unintended ways, has long posed a threat to AI systems since the...
8 MIN READ

Jul 30, 2025
Using CI/CD to Automate Network Configuration and Deployment
Continuous integration and continuous delivery/deployment (CI/CD) is a set of modern software development practices used for delivering code changes more...
6 MIN READ

Jul 23, 2025
Approaches to PDF Data Extraction for Information Retrieval
The PDF is among the most common file formats for sharing information such as financial reports, research papers, technical documents, and marketing materials....
11 MIN READ

Jul 22, 2025
Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication
The NVIDIA Collective Communications Library (NCCL) is essential for fast GPU-to-GPU communication in AI workloads, using various optimizations and tuning to...
14 MIN READ

Jul 22, 2025
Building Robotic Mental Models with NVIDIA Warp and Gaussian Splatting
This post explores a promising direction for building dynamic digital representations of the physical world, a topic gaining increasing attention in recent...
4 MIN READ

Jul 21, 2025
Traditional RAG vs. Agentic RAG—Why AI Agents Need Dynamic Knowledge to Get Smarter
Ever relied on an old GPS that didn’t know about the new highway bypass, or a sudden road closure? It might get you to your destination, but not in the most...
8 MIN READ

Jul 17, 2025
Safeguard Agentic AI Systems with the NVIDIA Safety Recipe
As large language models (LLMs) power more agentic systems capable of performing autonomous actions, tool use, and reasoning, enterprises are drawn to their...
7 MIN READ

Jul 16, 2025
CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design
GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...
12 MIN READ

Jul 16, 2025
CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels
In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models...
12 MIN READ