NVIDIA Technical Blog

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

2026-06-04T20:52:30Z

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete...

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows. However, these multi-agent workflows cause token counts to grow quickly. Agents plan, call tools, invoke sub-agents, receive information, and then pass history, outputs, and reasoning steps back into the model…

NVIDIA Technical Blog

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark

How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3

Advancing AI Infrastructure for Agentic AI with NVIDIA DOCA In-Silicon Security

NVIDIA Vera CPU Sets a New Standard for Agentic Workloads in AI Factories

NVIDIA DSX OS Delivers Open, Modular Software for Operating AI Factories at Scale

DynoSim: Simulating the Pareto Frontier

How to Automate AI Model Documentation with the NVIDIA MCG Toolkit

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI

NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes

NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance

What’s New for Game Developers in NVIDIA RTX: DLSS 4.5 for UE5 and Multilingual AI Characters

Extract More Kernel Performance with NVIDIA CompileIQ Auto-Tuning

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates

Run Key Genomics and Protein Folding Workloads Faster with NVIDIA RTX PRO 4500 Blackwell

Synthesize Realistic 3D Medical Images at Scale to Ship Pre‑Trained Models

Automating and Optimizing Financial Signal Discovery with Multi-Agent Systems

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling

Building Token‑Metered AI Services on Telco AI Factories

Mastering Agentic Techniques: AI Agent Customization

Add a Specialized Deep Research Skill to Agent Harnesses

NVIDIA-Verified Agent Skills Provide Capability Governance for AI Agents

Mastering Agentic Techniques: AI Agent Evaluation

How the NVIDIA Vera Rubin Platform is Solving Agentic AI’s Scale-Up Problem

Transform Video Into Instantly Searchable, Actionable Intelligence with AI Agents and Skills

Accelerated X-Ray Analysis for Nanoscale Imaging (XANI) of Novel Materials

How to Eliminate Pipeline Friction in AI Model Serving

Introducing NVIDIA Fleet Intelligence for Real-Time GPU Fleet Visibility and Optimization

Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm Block Scheduling

Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer

Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus

How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design

Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills

Speed Up Unreal Engine NNE Inference with NVIDIA TensorRT for RTX Runtime

Build AI-Powered Games with NVIDIA DLSS 4.5, RTX, and Unreal Engine 5

How to Build, Run, and Scale High-Quality Creator Workflows in ComfyUI

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl

Powering AI Factories with NVIDIA Enterprise Reference Architectures

Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model

24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE

Winning a Kaggle Competition with Generative AI–Assisted Coding

Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python

Scaling the AI-Ready Data Center with NVIDIA RTX PRO 4500 Blackwell Server Edition and NVIDIA vGPU 20

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo

Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw

Accelerate Clean, Modular, Nuclear Reactor Design with AI Physics

How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents

Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit

NVIDIA NVbandwidth: Your Essential Tool for Measuring GPU Interconnect and Memory Performance

NVIDIA Ising Introduces AI-Powered Workflows to Build Fault-Tolerant Quantum Systems

MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications

Running Large-Scale GPU Workloads on Kubernetes with Slurm

Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP

How to Accelerate Protein Structure Prediction at Proteome-Scale

Integrate Physical AI Capabilities into Existing Apps with NVIDIA Omniverse Libraries

Running AI Workloads on Rack-Scale Supercomputers: From Hardware to Topology-Aware Scheduling

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight

Bringing AI Closer to the Edge and On-Device with Gemma 4

Achieving Single-Digit Microsecond Latency Inference for Capital Markets

CUDA Tile Programming Now Available for BASIC!

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design

Accelerate Token Production in AI Factories Using Unified Services and Real-Time AI

Stream High-Fidelity Spatial Computing Content to Any Device with NVIDIA CloudXR 6.0