Transformers

Jun 16, 2026

How to Optimize Transformer-Based Models for Low-Precision Training

Transformer architectures are the backbone of many modern large language and generative AI models. As these models grow in size, training runs consume more GPU...

9 MIN READ

An image representing matrix multiplication.

May 01, 2025

Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two...

8 MIN READ

Jul 11, 2024

Next Generation of FlashAttention

NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture...

1 MIN READ

Jun 12, 2024

Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...

7 MIN READ

Jan 29, 2024

Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network

The past decade has seen a remarkable surge in the adoption of deep learning techniques for computer vision (CV) tasks. Convolutional neural networks (CNNs)...

13 MIN READ

Stylized image of a person interacting with an input screen that provides data to a glowing cube, which in turn provides an output screen.

Nov 29, 2023

New Course: Introduction to Transformer-Based Natural Language Processing

Learn how transformers are used as the building blocks of modern large language models in this new self-paced course.

1 MIN READ

Nov 17, 2023

Mastering LLM Techniques: Inference Optimization

Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a...

26 MIN READ

Nov 08, 2023

New Workshop: Rapid Application Development Using Large Language Models

Interested in developing LLM-based applications? Get started with this exploration of the open-source ecosystem.

1 MIN READ

Oct 24, 2023

Webinar: Transform Your Vision AI Applications with Generative AI

Explore new generative AI models from NVIDIA that will have a major impact on your vision AI developer stack.

1 MIN READ

3 CV overlays tracking people walking across a street.

Jul 25, 2023

Improve Accuracy and Robustness of Vision AI Apps with Vision Transformers and NVIDIA TAO

Vision Transformers (ViTs) are taking computer vision by storm, offering incredible accuracy, robust solutions for challenging real-world scenarios, and...

5 MIN READ

3 different versions of computer visions overlays of a road with pedestrians.

Jun 21, 2023

Webinar: Unleash the Power of Vision Transformers

Learn how Vision Transformers are revolutionizing AI applications with image understanding and analysis.

1 MIN READ

May 15, 2023

Efficiently Scale LLM Training Across a Large GPU Cluster with Alpa and Ray

Recent years have seen a proliferation of large language models (LLMs) that extend beyond traditional language tasks to generative AI. This includes models...

16 MIN READ

Feb 01, 2023

New cuBLAS 12.0 Features and Matrix Multiplication Performance on NVIDIA Hopper GPUs

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper architecture with the fourth generation of NVIDIA Tensor Cores, recently debuted delivering...

10 MIN READ

Sep 14, 2022

NVIDIA, Arm, and Intel Publish FP8 Specification for Standardization as an Interchange Format for AI

AI processing requires full-stack innovation across hardware and software platforms to address the growing computational demands of neural networks. A key area...

4 MIN READ

Sep 12, 2022

Improving Japanese Language ASR by Combining Convolutions with Attention Mechanisms

Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours...

5 MIN READ

Aug 03, 2022

Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server

This is the first part of a two-part series discussing the NVIDIA Triton Inference Server’s FasterTransformer (FT) library, one of the fastest libraries for...

10 MIN READ