Agentic AI / Generative AI

NVIDIA Vera CPU Sets a New Standard for Agentic Workloads in AI Factories

May 31, 2026

By Praveen Menon, Ivan Goldwasser, Ian Finder and Diana Aung

Each wave of AI has created a new scaling law. Pretraining scaled intelligence through larger datasets, more parameters, and massively parallel GPU systems. Post-training scaled usefulness through instruction tuning, and re-balancing GPUs for generative inference. Test-time scaling improved reasoning by giving models more generated tokens for thinking.

Now, agentic AI and reinforcement learning scale actions. Models take more steps, call more tools, run more evaluations, and interact with execution environments to perform tasks.

This blog explains how NVIDIA Vera CPUs help AI factories to scale agentic AI and reinforcement learning by shortening CPU execution time, increasing task throughput, improving overall AI factory output, and enabling smarter, longer-thinking agents.

Why CPUs matter more in the agentic era

GPUs remain essential for model inference and training. But across agentic AI, reinforcement learning, and data-intensive AI services, much of the execution surrounding the model runs on CPUs, such as:

Sandboxed code and tool execution
Data retrieval and data processing
Results computation
Scheduling and orchestration

This is a precise loop:

A prompt (either from a user, reasoning tokens, or a previous turn’s result) kicks off generation: “I should compile and run hello.c.”
The GPU generates the parameters of the tool call to be performed on the CPU: gcc -o hello hello.c ; ./hello
The CPU executes the tool call, producing results that are fed back to the GPUs to update weights during reinforcement learning, or used by the agent to generate the next prompt: Output: ‘Hello, world!’ – Task Returned (0) – Successful
The GPU generates reasoning tokens prompted by the result: “Hmm! It looks like that worked!”

As agents become more capable, they take more steps, call more tools, and run more checks. CPU time compounds across the request.

This makes the CPU part of the critical path. It’s no longer just a host processor feeding the GPU. It shapes latency, accelerator utilization, and AI factory output per watt and per dollar.

For the last decade, much of the data center CPU market optimized around cloud economics of more cores, more virtual machines, and lower cost per core. This remains important for general-purpose cloud services, but performance per core has not improved at the same rate.

This is further compounded by the end of Moore’s law, which limited generation-on-generation performance improvements in CPUs, even while GPU architectures and workloads benefited from a continuous cycle of co-optimization.

AI factories shift the metric from cores per dollar to tokens per dollar—from how many CPU cores a data center can rent, to how much AI output it can produce.

This demands a new CPU design point for AI factories:

High core counts to run thousands of concurrent agents, RL environments, sandboxes, and services.
High per-core performance, because each agentic step is gated by sequential execution.
Energy-efficient memory bandwidth to keep data moving without turning CPU infrastructure into a bottleneck.

The NVIDIA Vera CPU: Built for AI agents

The NVIDIA Vera CPU is designed for the reality of modern workloads, with fast per-core performance, high concurrency, and power-efficient memory bandwidth to keep the AI factory moving.

The Vera CPU combines 88 NVIDIA Olympus cores with up to 1.2 TB/s of LPDDR5X memory bandwidth to keep cores fed through tool calls, sandboxed execution of both native code and languages like Python or JavaScript, data retrieval, data processing, and orchestration.

The key requirement is fast per-core performance, sustained at all times. Unlike cloud virtual machines, the CPU sockets stay fully loaded, doing the work of many concurrent agents. Cores that remain fast under high system load reduce task completion time, delivering faster results while freeing up resources to serve the next request.

For agents, this means lower latency across multistep requests. For reinforcement learning, this means more completed evaluations and more data from each training window, helping models reach a higher quality bar faster. For AI factories, fast cores keep accelerators from waiting on orchestration, tool execution, or data movement.

Delivering this requires the core, memory subsystem, and fabric to be designed together for branch-heavy code, high-bandwidth data movement, and predictable performance under load.

This starts with the NVIDIA custom Olympus core inside the Vera CPU.

NVIDIA Olympus core and memory subsystem

The NVIDIA Olympus core delivers up to 50% higher IPC than NVIDIA Grace, combining a wide front end, advanced branch prediction, deep out-of-order instruction scheduling, and specialized memory prefetching to sustain high throughput on branch-heavy, memory-sensitive agentic code.

Olympus uses a neural branch predictor to reduce stalls in branch-heavy code. Combined with other prediction mechanisms, it can sustain two taken branches per cycle with zero penalty, maintaining throughput for deep software stacks such as PyTorch, graph workloads, and scripting engines.

Olympus also includes a 10-wide decode unit and a deep out-of-order engine designed to sustain high instructions per cycle. Large buffers and advanced instruction scheduling help the core maintain forward progress as code paths, dependencies, and memory access patterns shift.

Sustaining high IPC under load requires keeping the cores fed with data. Vera CPUs deliver up to 1.2 TB/s of LPDDR5X memory bandwidth, sustaining over 90% of peak memory bandwidth under load. It also offers 40% lower peak memory latency compared to x86 CPUs, ensuring Olympus cores are fed on time through retrieval, analytics, sandbox execution, and orchestration.

Olympus also adds a novel graph prefetcher built for indirect memory access patterns common in graph analytics and agent memory traversal. Combined with high-memory per-core bandwidth, Vera CPUs deliver more than 3x performance on graph traversal workloads compared with x86-based architectures.

The NVIDIA Scalable Coherency Fabric (SCF) connects all cores and a unified cache across a monolithic mesh, delivering predictable latency and 50% faster core-to-core data movement compared with CPUs that fragment compute across dies. For reinforcement learning and agentic AI, that predictability helps keep evaluation loops sustained under full load.

Together, the Olympus core, NVIDIA SCF, and LPDDR5X memory subsystem enable the Vera CPU to deliver more than 1.8x higher sandbox performance across agentic workloads under full load compared with the competition, as shown in Figure 4.

System efficiency

Beyond performance, agentic AI places increasing pressure on infrastructure efficiency. As AI factories scale to thousands of CPUs, memory power can become a major contributor to platform power, cooling demand, and operating cost.

The Vera CPU pairs its architecture with high-bandwidth SOCAMM LPDDR5X memory to reduce memory power compared with traditional DDR server designs. The LPDDR5X subsystem typically consumes less than 30 watts, compared with well over 100 watts for DDR5 configurations. MRDIMM-based systems can drive memory power even higher.

With a configurable 250 W to 450 W TDP range, the Vera CPU reduces combined CPU and memory subsystem power while delivering the bandwidth needed for agentic inference and reinforcement learning environments. For AI factories, this translates into better performance per watt, lower operating costs, and more efficient use of power and cooling infrastructure.

The AI factory CPU for agents

The era of agentic AI requires a shift in CPU design—from maximizing cores per dollar to maximizing AI factory output per watt and per dollar. NVIDIA Vera CPU is the CPU for agents, combining fast per-core performance, high concurrency, and power-efficient memory bandwidth. With the custom Olympus core, LPDDR5X memory, and NVIDIA Scalable Coherency Fabric, Vera CPU delivers more than 1.8x higher agentic sandbox performance than traditional x86 architectures, helping AI factories complete more tool calls, return more evaluations, and keep accelerators moving.

Learn More about the Vera CPU, the NVIDIA Vera Rubin NVL2, and the Vera CPU benchmarking by Phoronix.

Relative performance based on measured data, and subject to change. NVIDIA Vera CPU with LPDDR5X performance baselined to the latest x86 CPU.

Discuss (0)

About the Authors

About Praveen Menon
Praveen is a senior technical marketing engineer focused on accelerated computing platforms in the data center at NVIDIA. Previously, Praveen held various roles across marketing, performance, and silicon engineering. He holds a master’s degree in electrical and computer engineering from the University of Arizona

View all posts by Praveen Menon

About Ivan Goldwasser
Ivan leads product marketing for the Data Center CPU products for NVIDIA. Previously, Ivan worked in various marketing and strategy roles in the technology sector. Ivan has an MBA from Georgetown’s McDonough School of Business and a bachelor’s degree in chemical engineering from Texas A&M University.

View all posts by Ivan Goldwasser

About Ian Finder
Ian Finder leads product and business for NVIDIA data center CPUs, including Grace and Vera. Before that, he led product and platform architecture for the GPU and accelerated computing portfolio at a large hyperscaler, where he developed, among other things, the virtualized NVIDIA A100-based AI supercomputing infrastructure behind some of the breakout moments in scale-up foundation model training. He holds a Bachelor of Science in Computer Engineering from the Paul G. Allen School of Computer Science and Engineering (CSE) in Seattle. He is the product of a lifelong obsession with computer architecture—a point proven by the Control Data supercomputer occupying his garage

View all posts by Ian Finder

About Diana Aung
Diana Aung is a senior product manager for the data center CPU product portfolio at NVIDIA. She has more than 12 years of experience in the semiconductor industry and holds degrees in management and computer science from Boston University and New York University.

View all posts by Diana Aung