Tensor Cores

Feb 25, 2026

Making Softmax More Efficient with NVIDIA Blackwell Ultra

LLM context lengths are exploding, and architectures are moving toward complex attention schemes like Multi-Head Latent Attention (MLA) and Grouped Query...

10 MIN READ

Jun 16, 2025

AI Aims to Bring Order to the Law

A team of Stanford University researchers has developed an LLM system to cut through bureaucratic red tape. The LLM—dubbed the System for Statutory Research,...

4 MIN READ

Jun 10, 2025

How Modern Supercomputers Powered by NVIDIA Are Pushing the Limits of Speed — and Science

Modern high-performance computing (HPC) is enabling more than just quick calculations — it’s powering AI systems that are unlocking scientific...

6 MIN READ

Jun 08, 2025

AI Helps Locate Dangerous Fishing Nets Lost at Sea

Conservationists have launched a new AI tool that can sift through petabytes of underwater imaging from anywhere in the world to identify signs of abandoned or...

4 MIN READ

Jun 04, 2025

Floating-Point 8: An Introduction to Efficient, Lower-Precision AI Training

With the growth of large language models (LLMs), deep learning is advancing both model architecture design and computational efficiency. Mixed precision...

11 MIN READ

Jan 15, 2025

GPU Memory Essentials for AI Performance

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging...

6 MIN READ

An image of an NVIDIA H200 Tensor Core GPU.

Mar 27, 2024

NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI...

11 MIN READ

Apr 05, 2023

Setting New Records in MLPerf Inference v3.0 with Full-Stack Optimizations for AI

The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment...

15 MIN READ

Mar 22, 2022

NVIDIA Hopper Architecture In-Depth

Today during the 2022 NVIDIA GTC Keynote address, NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU based on the new NVIDIA Hopper GPU...

36 MIN READ

Sep 24, 2021

Explore and Test Experimental Models for DLSS Research

Today, NVIDIA is enabling developers to explore and evaluate experimental AI models for Deep Learning Super Sampling (DLSS). Developers can download...

2 MIN READ

Jul 20, 2021

Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT

This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. When deploying a neural network, it's useful to think about how the network could...

8 MIN READ

Feb 17, 2021

Tips: Getting the Most out of the DLSS Unreal Engine 4 Plugin

DLSS is a deep learning, super-resolution network that boosts frame rates by rendering fewer pixels and then using AI to construct sharp, higher-resolution...

5 MIN READ

Jan 27, 2021

Accelerating AI Training with NVIDIA TF32 Tensor Cores

NVIDIA Ampere GPU architecture introduced the third generation of Tensor Cores, with the new TensorFloat32 (TF32) mode for accelerating FP32 convolutions and...

10 MIN READ

Aug 07, 2020

Bringing Tensor Cores to Standard Fortran

Tuned math libraries are an easy and dependable way to extract the ultimate performance from your HPC system. However, for long-lived applications or those...

10 MIN READ

Jul 24, 2020

Accelerating TensorFlow on NVIDIA A100 GPUs

The NVIDIA A100, based on the NVIDIA Ampere GPU architecture, offers a suite of exciting new features: third-generation Tensor Cores, Multi-Instance GPU (MIG)...

12 MIN READ

May 14, 2020

Defining AI Innovation with NVIDIA DGX A100

Organizations of all kinds are incorporating AI into their research, development, product, and business processes. This helps them meet and exceed their...

15 MIN READ