Powering AI Factories with NVIDIA Enterprise Reference Architectures

The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and real-time decision-making at scale, competitive advantage increasingly depends on the infrastructure that supports them.

Success requires more than raw compute. It demands a scalable, predictable foundation that can orchestrate intelligent agents, manage data movement efficiently, and deliver consistent performance from pilot to production. AI factories powered by NVIDIA bring industrial-grade discipline to AI, changing infrastructure into a strategic engine for speed, reliability, and accelerated innovation.

Infrastructure is one of the five layers of AI, and represents the foundation for AI factories. Building that foundation, however, requires more than selecting high-performance hardware. Enterprises need proven architectural guidance that removes integration risk, reduces time to deployment, and ensures performance at scale. NVIDIA Enterprise Reference Architectures (Enterprise RAs) provide that infrastructure guidance for on-premises deployments, defining how compute, networking, storage, software, and system components integrate into a production-ready AI platform.

With Enterprise RAs, organizations can move from experimentation to scalable AI operations, producing tokens that drive intelligence and business outcomes at an industrial scale. The NVIDIA Enterprise AI Factory validated design completes the picture by curating a full stack of NVIDIA software and ecosystem partner software validated by NVIDIA, for enterprises to operationalize the AI factory for their agentic AI workloads.

Based on NVIDIA-Certified Systems and built in collaboration with partners, NVIDIA Enterprise RAs power enterprises to deploy and scale on-premises AI factories. These RAs provide detailed, end-to-end guidance on everything from GPU count, memory, storage, networking, and observability, to full-stack integration, encompassing hardware, software, orchestration, and monitoring. Once server nodes are NVIDIA-Certified, they form the foundational building blocks for enterprise RA clusters.

Enterprise RAs form the foundation of AI factories

To get started with building AI factories, three NVIDIA AI Factory configurations can accelerate computing architectures: the NVIDIA RTX PRO AI Factory (with NVIDIA RTX PRO Servers), NVIDIA HGX AI Factory (with NVIDIA HGX-based systems), and NVIDIA NVL72 AI Factory (with rack-scale systems based on NVIDIA GB300 NVL72 platform). Each operates at different scales, infrastructure requirements, workloads, and performance objectives.

Organizations can begin with the configuration and architecture that aligns with their immediate needs, and scale as AI ambitions expand. Mature AI deployments often include a blended portfolio of the mentioned AI factory configurations to optimize performance across a range of different inference, training, and visual computing workloads.

NVIDIA RTX PRO AI Factory: The universal accelerator

The NVIDIA RTX PRO AI Factory, based on 2-8-5-200 (CPU-GPU-NIC – E/W Bandwidth) reference configuration, delivers a modular, power-efficient foundation for enterprise AI. Built around NVIDIA RTX PRO Blackwell Server Edition GPUs, this architecture is optimized for small to medium model inference, fine-tuning, generative AI, visual computing, and industrial AI workloads. It enables enterprises to bring AI closer to core business workflows—supporting multimodal agentic systems, simulation, analytics, and rendering within a standard enterprise data center footprint.

Each NVIDIA-Certified RTX PRO Server integrates up to eight GPUs, delivering high-performance AI compute within a flexible, air-cooled server design. Cluster deployments can scale from tens to hundreds of GPUs, with available examples supporting 128- and 256-GPU cluster environments. High-speed NVIDIA Spectrum-X Ethernet networking and NVIDIA BlueField-3 acceleration enable efficient east-west communication and secure north-south data flow‌. This creates a foundation for enterprise AI inference, digital twins, visual computing, scientific computing, and data analytics at scale.

NVIDIA HGX AI Factory: Breakthrough performance for enterprise AI

The NVIDIA HGX AI Factory configuration is the foundation most large enterprises standardize on when training, fine-tuning, and deploying AI models at scale. It’s engineered for continuous operation and balanced performance across training and inference workloads. An enterprise can deploy some clusters based on the HGX AI Factory design and others on the RTX PRO AI Factory design, which is what NVIDIA IT uses internally to run its own AI factory.

Based on the 2-8-9-800 reference configuration, the NVIDIA HGX AI Factory is designed for organizations training and fine-tuning large language models or running high-throughput AI inference. It enables predictable scaling while maintaining efficiency across compute, memory, and networking for multi-user enterprise environments that require AI performance and operational simplicity.

At its core, the NVIDIA HGX B300 platform integrates eight NVIDIA Blackwell Ultra GPUs connected through fifth-generation NVIDIA NVLink and NVSwitch technology, forming a tightly coupled, high-bandwidth compute domain within each node. With up to 270 GB of HBM3 memory per GPU and up to 2.1 TB of aggregate GPU memory per node, this platform is optimized for large-model training, fine-tuning, and medium- to large-parameter AI inference workloads.

High-speed NVIDIA Spectrum-X Ethernet networking with NVIDIA ConnectX-8 SuperNICs provides up to 800 Gb/s per GPU for east-west communication across clusters, minimizing bottlenecks during distributed training and large-scale inference.

NVIDIA NVL72 AI Factory: Powering exascale AI

The NVIDIA NVL72 AI Factory stands as one of the most advanced rack-scale platforms. Built for the era of trillion-parameter models and AI reasoning systems, it features the NVIDIA GB300 NVL72 system to deliver top performance per rack while maximizing efficiency across compute, memory, and networking resources.

It’s engineered for organizations that demand massive scalability without compromising predictability or time-to-value. The architecture is optimized for intensive enterprise AI workloads—including large-scale foundation model training, fine-tuning, high-throughput multi-tenant inference, and complex Agentic AI pipelines.

The NVL72 AI Factory is an integrated, liquid-cooled rack-scale system combining 36 Grace CPUs and 72 Blackwell Ultra GPUs, interconnected through fifth-generation NVLink. Every GPU communicates with every other GPU through a unified, high-bandwidth NVLink fabric, enabling the rack to function as a single, coherent compute domain. This tightly-coupled design minimizes communication latency and eliminates bottlenecks common in traditional cluster architectures. Integrated NVIDIA ConnectX-8 SuperNICs ensure high-throughput east-west traffic for AI training and inference, while NVIDIA BlueField DPUs streamline north-south data flow—together allowing the entire rack to operate as a cohesive, data-center-scale supercomputer.

AI factory configurations based on NVIDIA Enterprise Reference Architectures (RAs) provide the architectural foundation, but validated implementations from our system partners are what establish confidence. Our system partners use these RAs to build solutions that undergo a technical review by the NVIDIA Design Review Board (DRB), where their designs are assessed against NVIDIA-defined criteria and standards.

Some partners validate specific layers of the stack, while others validate full end-to-end systems across hardware, software, and networking. Designs that meet these requirements are recognized as NVIDIA-endorsed solutions, and a current list of endorsed partners and their offerings is available on the NVIDIA Enterprise RA documentation page.

Global system partners are delivering Enterprise RA-based solutions tested across a range of scale points—from small pilot deployments to large AI factory clusters. This ecosystem approach gives enterprises transparency, choice, and confidence.

Faster deployment and lower TCOs

Enterprise RAs extend beyond systems engineering, as actionable recipes for accelerated deployment and long-term efficiency. They are designed to help organizations:

Cut through infrastructure indecision.
Reduce redesign cycles and operational overhead.
Compress deployment timelines from months to weeks.
Optimize utilization and long-term TCO.
Maximize uptime and optimize performance with Enterprise Support.

As more than technical guidance, they enable enterprises to move from proof-of-concept to production with clarity and confidence. When combined with the software architecture and recommendations from the NVIDIA Enterprise AI Factory validated design, organizations have the full-stack support and guidance to deploy an on-premises AI factory, deliver faster time-to-value, and drive business innovation with AI.