Posts by Amr Elmeleegy
Generative AI
Nov 01, 2024
3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot
Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input...
5 MIN READ
Generative AI
Oct 28, 2024
NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models
Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...
7 MIN READ
Data Center / Cloud
Oct 15, 2024
NVIDIA Contributes NVIDIA GB200 NVL72 Designs to Open Compute Project
During the 2024 OCP Global Summit, NVIDIA announced that it has contributed the NVIDIA GB200 NVL72 rack and compute and switch tray liquid cooled designs to the...
10 MIN READ
Data Center / Cloud
Sep 24, 2024
NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1
In the latest round of MLPerf Inference – a suite of standardized, peer-reviewed inference benchmarks – the NVIDIA platform delivered outstanding...
7 MIN READ
Generative AI
Aug 28, 2024
NVIDIA Triton Inference Server Achieves Outstanding Performance in MLPerf Inference 4.1 Benchmarks
Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use...
8 MIN READ
Data Center / Cloud
Aug 20, 2024
NVIDIA GH200 Superchip Delivers Breakthrough Energy Efficiency and Node Consolidation for Apache Spark
With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...
8 MIN READ