Ampere

May 28, 2023
NVIDIA AX800 Delivers High-Performance 5G vRAN and AI Services on One Common Cloud Infrastructure
The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CAPEX,...
11 MIN READ

Feb 02, 2023
Benchmarking Deep Neural Networks for Low-Latency Trading and Rapid Backtesting on NVIDIA GPUs
Lowering response times to new market events is a driving force in algorithmic trading. Latency-sensitive trading firms keep up with the ever-increasing pace of...
8 MIN READ

Aug 30, 2022
Dividing NVIDIA A30 GPUs and Conquering Multiple Workloads
Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each...
9 MIN READ

Jun 16, 2022
Accelerating Quantized Networks with the NVIDIA QAT Toolkit for TensorFlow and NVIDIA TensorRT
We’re excited to announce the NVIDIA Quantization-Aware Training (QAT) Toolkit for TensorFlow 2 with the goal of accelerating the quantized networks with...
9 MIN READ

Jun 02, 2022
Fueling High-Performance Computing with Full-Stack Innovation
High-performance computing (HPC) has become the essential instrument of scientific discovery. Whether it is discovering new, life-saving drugs, battling...
8 MIN READ

May 25, 2022
Training a State-of-the-Art ImageNet-1K Visual Transformer Model using NVIDIA DGX SuperPOD
Recent work has demonstrated that large transformer models can achieve or advance the SOTA in computer vision tasks such as semantic segmentation and object...
9 MIN READ

May 11, 2022
Accelerating AI Inference Workloads with NVIDIA A30 GPU
NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC...
6 MIN READ

Sep 08, 2021
Register for the NVIDIA Metropolis Developer Webinars on Sept. 22
Join NVIDIA experts and Metropolis partners on Sept. 22 for webinars exploring developer SDKs, GPUs, go-to-market opportunities, and more. All three sessions,...
2 MIN READ

Aug 25, 2021
Deploying NVIDIA Triton at Scale with MIG and Kubernetes
NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production. Clients...
24 MIN READ

Jul 29, 2021
Discovering New Features in CUDA 11.4
NVIDIA announces the newest release of the CUDA development environment, CUDA 11.4. This release includes GPU-accelerated libraries, debugging and optimization...
14 MIN READ

Jul 20, 2021
Real-Time Natural Language Processing with BERT Using NVIDIA TensorRT (Updated)
This post was originally published in August 2019 and has been updated for NVIDIA TensorRT 8.0. Large-scale language models (LSLMs) such as BERT, GPT-2, and...
18 MIN READ

Apr 15, 2021
Using Tensor Cores in CUDA Fortran
Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit)...
28 MIN READ

Mar 19, 2021
Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores
Sparse-matrix dense-matrix multiplication (SpMM) is a fundamental linear algebra operation and a building block for more complex algorithms such as finding the...
7 MIN READ

Jan 27, 2021
Accelerating AI Training with NVIDIA TF32 Tensor Cores
NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...
10 MIN READ

Jan 26, 2021
Adding More Support in NVIDIA GPU Operator
Editor's note: Interested in GPU Operator? Register for our upcoming webinar on January 20th, "How to Easily use GPUs with Kubernetes". Reliably provisioning...
6 MIN READ

Dec 18, 2020
Minimizing Deep Learning Inference Latency with NVIDIA Multi-Instance GPU
Recently, NVIDIA unveiled the A100 GPU model, based on the NVIDIA Ampere architecture. Ampere introduced many features, including Multi-Instance GPU (MIG), that...
20 MIN READ