Matthew Nicely

Matthew Nicely is a senior product manager over Deep Learning Compilers at NVIDIA, working with cuDNN and CUTLASS. At NVIDIA, he has worked as a public sector solution architect and CUDA Math Libraries product manager. In 2019, he received his Ph.D. in computer engineering, focusing on algorithm optimizations on GPUs.
Avatar photo

Posts by Matthew Nicely

Agentic AI / Generative AI

Boosting MoE Training Throughput with Advanced Fusion Kernels

Mixture-of-experts (MoE) models have quickly become a foundational component of modern, large-scale AI systems. They are widely adopted because they enable... 9 MIN READ
Networking / Communications

Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication

The NVIDIA Collective Communications Library (NCCL) is essential for fast GPU-to-GPU communication in AI workloads, using various optimizations and tuning to... 14 MIN READ
Networking / Communications

NCCL Deep Dive: Cross Data Center Communication and Network Topology Awareness

As the scale of AI training increases, a single data center (DC) is not sufficient to deliver the required computational power. Most recent approaches to... 9 MIN READ
Data Center / Cloud

Just Released: CUTLASS 3.8

Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance... 1 MIN READ
Data Center / Cloud

Just Released: NVIDIA cuDNN 9.7

Bringing support for NVIDIA Blackwell architecture across data center and GeForce products, NVIDIA cuDNN 9.7 delivers speedups of up to 84% for FP8 Flash... 1 MIN READ
Decorative image of cuDNN attention.
Agentic AI / Generative AI

Accelerating Transformers with NVIDIA cuDNN 9

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library for accelerating deep learning primitives with state-of-the-art performance.... 12 MIN READ