cuBLAS

Jun 16, 2026

NVIDIA Blackwell Tops MLPerf Training 6.0 with Industry-Leading Scale and Performance

NVIDIA delivered a clean sweep in MLPerf Training v6.0, the latest edition of industry-standard AI training benchmarks developed by the MLCommons consortium....

11 MIN READ

Mar 09, 2026

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features

CUDA 13.2 arrives with a major update: NVIDIA CUDA Tile is now supported on devices of compute capability 8.X architectures (NVIDIA Ampere and NVIDIA Ada), as...

15 MIN READ

Dec 04, 2025

NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago. In this release, you’ll...

11 MIN READ

Oct 28, 2025

Introducing the CodonFM Open Model for RNA Design and Analysis

Open research is critical for driving innovation, and many breakthroughs in AI and science are achieved through open collaboration. In the field of digital...

10 MIN READ

Oct 24, 2025

Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS

NVIDIA CUDA-X math libraries provide the fundamental numerical building blocks that enable developers to deploy accelerated applications across multiple...

11 MIN READ

Jun 04, 2025

NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0

The journey to create a state-of-the-art large language model (LLM) begins with a process called pretraining. Pretraining a state-of-the-art model is...

12 MIN READ

An image representing matrix multiplication.

May 01, 2025

Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9

The NVIDIA CUDA-X math libraries empower developers to build accelerated applications for AI, scientific computing, data processing, and more. Two...

8 MIN READ

Feb 04, 2025

New AI Model Offers Cellular-Level View of Cancerous Tumors

Researchers studying cancer unveiled a new AI model that provides cellular-level mapping and visualizations of cancer cells, which scientists hope can shed...

3 MIN READ

Dec 14, 2024

Introducing Tile-Based Programming in Warp 1.5.0

With the latest release of Warp 1.5.0, developers now have access to new tile-based programming primitives in Python. Leveraging cuBLASDx and cuFFTDx, these...

14 MIN READ

Code showing how to use epilogs with matrix multiplication in nvmath-python.

Nov 18, 2024

Fusing Epilog Operations with Matrix Multiplication Using nvmath-python

nvmath-python (Beta) is an open-source Python library, providing Python programmers with access to high-performance mathematical operations from NVIDIA CUDA-X...

8 MIN READ

Oct 09, 2024

Just Released: Updated Math Libraries in CUDA Toolkit 12.6.2

CUDA Toolkit 12.6.2 improves performance and provides new features in cuBLAS, cuSOLVER, and cuFFT LTO libraries.

1 MIN READ

Aug 01, 2024

Just Released: CUDA Toolkit 12.6

The release supports GB100 capabilities and new library enhancements to cuBLAS, cuFFT, cuSOLVER, cuSPARSE, as well as the release of Nsight Compute 2024.3.

1 MIN READ

Jun 12, 2024

Introducing Grouped GEMM APIs in cuBLAS and More Performance Updates

The latest release of NVIDIA cuBLAS library, version 12.5, continues to deliver functionality and performance to deep learning (DL) and high-performance...

7 MIN READ

Feb 01, 2024

Just Released: NVIDIA HPC SDK v24.1

This NVIDIA HPC SDK update includes the cuBLASMp preview library, along with minor bug fixes and enhancements.

1 MIN READ

Jan 12, 2024

Just Released: cuBLASDx

cuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview...

1 MIN READ

Dec 20, 2023

Just Released: cuBLASMp

cuBLASMp is a high-performance, multi-process, GPU-accelerated library for distributed basic dense linear algebra. It is available to download in Preview now.

1 MIN READ