Cloud Services
Nov 15, 2024
NVIDIA NIM 1.4 Ready to Deploy with 2.4x Faster Inference
The demand for ready-to-deploy high-performance inference is growing as generative AI reshapes industries. NVIDIA NIM provides production-ready microservice...
3 MIN READ
Nov 15, 2024
Streamlining AI Inference Performance and Deployment with NVIDIA TensorRT-LLM Chunked Prefill
In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment...
4 MIN READ
Nov 14, 2024
NVIDIA DOCA 2.9 Enhances AI and Cloud Computing Infrastructure with New Performance and Security Features
NVIDIA DOCA enhances the capabilities of NVIDIA networking platforms by providing a comprehensive software framework for developers to leverage hardware...
9 MIN READ
Nov 01, 2024
3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot
Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input...
5 MIN READ
Oct 28, 2024
NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models
Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...
7 MIN READ
Oct 24, 2024
Building AI Agents to Automate Software Test Case Creation
In software development, testing is crucial for ensuring the quality and reliability of the final product. However, creating test plans and specifications can...
15 MIN READ
Oct 21, 2024
IBM’s New Granite 3.0 Generative AI Models Are Small, Yet Highly Accurate and Efficient
Today, IBM released the third generation of IBM Granite, a collection of open language models and complementary tools. Prior generations of Granite focused on...
5 MIN READ
Oct 15, 2024
Supermicro Launches NVIDIA BlueField-Powered JBOF to Optimize AI Storage
The growth of AI is driving exponential growth in computing power and a doubling of networking speeds every few years. Less well-known is that it’s also...
6 MIN READ
Oct 15, 2024
Powering Next-Generation AI Networking with NVIDIA SuperNICs
In the era of generative AI, accelerated networking is essential to build high-performance computing fabrics for massively distributed AI workloads. NVIDIA...
6 MIN READ
Oct 15, 2024
NVIDIA Contributes NVIDIA GB200 NVL72 Designs to Open Compute Project
During the 2024 OCP Global Summit, NVIDIA announced that it has contributed the NVIDIA GB200 NVL72 rack and compute and switch tray liquid cooled designs to the...
10 MIN READ
Oct 09, 2024
Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch
The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...
8 MIN READ
Oct 07, 2024
Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries
Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft's TuringMM...
11 MIN READ
Sep 26, 2024
Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance
Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...
8 MIN READ
Sep 17, 2024
Optimizing Data Center Performance with AI Agents and the OODA Loop Strategy
For any data center, operating large, complex GPU clusters is not for the faint of heart! There is a tremendous amount of complexity. Cooling, power,...
12 MIN READ
Sep 16, 2024
Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22
For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....
8 MIN READ
Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ