Posts by Ashraf Eassa
Generative AI
Dec 17, 2024
Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding
Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only...
8 MIN READ
Top Stories
Nov 19, 2024
Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs
Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are...
6 MIN READ
Data Center / Cloud
Nov 13, 2024
NVIDIA Blackwell Doubles LLM Training Performance in MLPerf Training v4.1
As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance,...
8 MIN READ
Generative AI
Nov 01, 2024
3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot
Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input...
5 MIN READ
Generative AI
Oct 28, 2024
NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models
Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...
7 MIN READ
Data Center / Cloud
Oct 09, 2024
NVIDIA Grace CPU Delivers World-Class Data Center Performance and Breakthrough Energy Efficiency
NVIDIA designed the NVIDIA Grace CPU to be a new kind of high-performance, data center CPU—one built to deliver breakthrough energy efficiency and optimized...
8 MIN READ