Cloud Services
Oct 09, 2024
Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch
The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...
8 MIN READ
Oct 07, 2024
Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries
Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft's TuringMM...
11 MIN READ
Sep 26, 2024
Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance
Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...
8 MIN READ
Sep 17, 2024
Optimizing Data Center Performance with AI Agents and the OODA Loop Strategy
For any data center, operating large, complex GPU clusters is not for the faint of heart! There is a tremendous amount of complexity. Cooling, power,...
12 MIN READ
Sep 16, 2024
Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22
For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....
8 MIN READ
Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ
Aug 15, 2024
NVIDIA TensorRT Model Optimizer v0.15 Boosts Inference Performance and Expands Model Support
NVIDIA has announced the latest v0.15 release of NVIDIA TensorRT Model Optimizer, a state-of-the-art quantization toolkit of model optimization techniques...
5 MIN READ
Aug 14, 2024
Just Released: DOCA 2.8 Software Framework
The new release includes support for Spectrum-X 1.1 RA and new features for AI Cloud Data Centers.
1 MIN READ
Aug 07, 2024
Profit and Loss Modeling on GPUs with ISO C++ Language Parallelism
The previous post How to Accelerate Quantitative Finance with ISO C++ Standard Parallelism demonstrated how to write a Black-Scholes simulation using ISO C++...
10 MIN READ
Aug 06, 2024
Spotlight: NVIDIA BlueField DPUs Power the VAST Data Platform for AI Workload Optimization
As the demand for sophisticated AI capabilities escalates, VAST Data introduces the VAST Data Platform, now enhanced with NVIDIA BlueField DPUs. This innovation...
7 MIN READ
Aug 06, 2024
A Deep Dive into the Latest AI Models Optimized with NVIDIA NIM
Delivered as optimized containers, NVIDIA NIM microservices are designed to accelerate AI application development for businesses of all sizes, paving the way...
9 MIN READ
Aug 01, 2024
Measuring Generative AI Model Performance Using NVIDIA GenAI-Perf and an OpenAI-Compatible API
NVIDIA offers tools like Perf Analyzer and Model Analyzer to assist machine learning engineers with measuring and balancing the trade-off between latency and...
6 MIN READ
Jul 30, 2024
Enhancing RAG Pipelines with Re-Ranking
In the rapidly evolving landscape of AI-driven applications, re-ranking has emerged as a pivotal technique to enhance the precision and relevance of enterprise...
8 MIN READ
Jul 30, 2024
Empowering Energy Trading with MetDesk and NVIDIA Earth-2
Despite the continuous improvement of weather forecasts over the last few decades, uncertainties due to meteorological measurements and models mean that...
13 MIN READ
Jul 23, 2024
Accelerate AI Infrastructure Using an NVIDIA BlueField-3 DPU Integration with DDN Storage
As AI becomes integral to organizational innovation and competitive advantage, the need for efficient and scalable infrastructure is more critical than ever. A...
6 MIN READ
Jul 15, 2024
Power Your AI Projects with New NVIDIA NIMs for Mistral and Mixtral Models
Large language models (LLMs) are growing in adoption across enterprise organizations, with many building them into their AI applications. Foundation models are...
5 MIN READ