As demand for AI continues to grow, hyperscalers are looking for ways to accelerate deployment of specialized AI infrastructure with the highest performance.
Announced today at AWS re:Invent, Amazon Web Services collaborated with NVIDIA to integrate with NVIDIA NVLink Fusion — a rack-scale platform that lets industries build custom AI rack infrastructure with NVIDIA NVLink scale-up interconnect technology and a vast ecosystem of partners — to accelerate deployment for the new Trainium4 AI chips, Graviton CPUs, Elastic Fabric Adapters (EFAs) and the Nitro System virtualization infrastructure.
AWS is designing Trainium4 to integrate with NVLink 6 and the NVIDIA MGX rack architecture, the first of a multigenerational collaboration between NVIDIA and AWS for NVLink Fusion.
With best-in-class scale-up networking, a complete technology stack and a comprehensive ecosystem of partners building on the technology, NVLink Fusion boosts performance, increases return on investment, reduces deployment risks, and accelerates time to market for custom AI silicon.
Challenges to deploying custom AI silicon
AI workloads are getting larger, models are becoming more complex, and the pressure to rapidly deploy AI compute infrastructure that meets the needs of the growing market is higher than ever.
Emerging workloads like planning, reasoning and agentic AI, running on hundreds of billions- to trillion-parameter models and mixture-of-experts (MoE) model architectures require many systems with many accelerators all working in parallel, and connected in a single fabric.
Meeting these demands requires a scale-up network, like NVLink, to connect entire racks of accelerators together with a high-bandwidth, low-latency interconnect.
Hyperscalers face challenges for deploying such specialized solutions:
- Long development cycles for rack-scale architecture: In addition to designing a custom AI chip, hyperscalers need to develop a scale-up networking solution, scale-out and storage networking, and a rack design including trays, cooling, power-delivery, system management and AI acceleration software. This can cost billions of dollars and take years to deploy.
- Managing a complex supplier ecosystem: Manufacturing full-rack architecture requires a complex supplier ecosystem for CPUs and GPUs, scale-up networking, scale-out networking, racks and trays, as well as busbars, power shelves, powerwhips, cold plates, coolant distribution units and quick disconnects. Managing dozens of suppliers and hundreds of thousands of components is incredibly complex, and a single supply delay or component change can put the entire project at risk.
NVLink Fusion addresses these challenges, helping hyperscalers remove networking performance bottlenecks, reduce deployment risks and accelerate time to market for custom AI silicon.
NVLink Fusion enables custom AI infrastructure
NVLink Fusion offers a rack-scale AI infrastructure platform that enables hyperscalers and custom ASIC designers to integrate custom ASICs with NVLink and the OCP MGX rack-scale server architecture.
Boost performance with NVLink 6 scale-up networking
At the core of NVLink Fusion is the NVLink Fusion chiplet. Hyperscalers can drop the chiplet into their custom ASIC designs to connect to the NVLink scale-up interconnect and NVLink Switch. The NVLink Fusion technology portfolio includes the Vera-Rubin NVLink Switch tray with the sixth generation NVLink Switch and 400G custom SerDes. It enables NVLink Fusion adopters to connect up to 72 custom ASICs all-to-all at 3.6 TB/s per-ASIC, for a total of 260 TB/s of scale-up bandwidth.

NVLink Switch enables peer-to-peer memory access using direct loads, stores, and atomic operations, as well as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) for in-network reductions and multicast acceleration.
Unlike other scale-up networking approaches, NVLink is a proven, widely adopted technology. And combined with NVIDIA AI acceleration software, NVLink Switch delivers up to 3x the performance and revenue for AI inference1 by connecting 72 accelerators in a single scale-up domain.

Reduce development costs and accelerate time-to-market with proven architecture and ecosystem
NVLink Fusion adopters can tap into a modular portfolio of AI factory technology, including NVIDIA MGX rack architecture, GPUs, NVIDIA Vera CPUs, co-packaged optics switches, NVIDIA ConnectX SuperNICs, NVIDIA BlueField DPUs and NVIDIA Mission Control software, along with an ecosystem of ASIC designers, CPU and IP providers, and manufacturers.
This technology portfolio that comes with NVLink Fusion enables hyperscalers to reduce development costs and time to market compared with sourcing their own technology stack.
AWS is also harnessing the NVLink Fusion OEMs/ODMs and supplier ecosystem, which provides all the components required for full rack-scale deployment, from the rack and chassis to power-delivery and cooling systems. This ecosystem lets hyperscalers eliminate a majority of the risks associated with rack-scale deployments.
Heterogeneous AI silicon, single rack-scale infrastructure
NVLink Fusion also allows AWS to build a heterogeneous silicon offering using the same footprint, cooling system and power distribution AI factory designs they already deploy.
NVLink Fusion adopters can use as little or much of the platform as they want — each piece can help them quickly scale up to meet the demands of intensive inference and agentic AI model training workloads.
Bringing custom AI chips to market is hard. NVLink Fusion enables hyperscalers and custom ASIC designers to leverage the proven NVIDIA MGX rack architecture, and NVLink scale-up networking. By leveraging NVLink Fusion for Trainium4 deployment, AWS will drive faster innovation cycles and accelerate time-to-market.
Learn more about NVLink Fusion.
13x performance increase based on fifth generation NVLink, comparing NVL72 GB200 to NVL8 B200, both with NVLink Switch.