Matthew Nicely

Matthew Nicely is a senior product manager over Deep Learning Compilers at NVIDIA, working with cuDNN and CUTLASS. At NVIDIA, he has worked as a public sector solution architect and CUDA Math Libraries product manager. In 2019, he received his Ph.D. in computer engineering, focusing on algorithm optimizations on GPUs.

Posts by Matthew Nicely

Agentic AI / Generative AI Jun 15, 2026

Boosting MoE Training Throughput with Advanced Fusion Kernels

Mixture-of-experts (MoE) models have quickly become a foundational component of modern, large-scale AI systems. They are widely adopted because they enable... 9 MIN READ

Networking / Communications Jul 22, 2025

Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication

The NVIDIA Collective Communications Library (NCCL) is essential for fast GPU-to-GPU communication in AI workloads, using various optimizations and tuning to... 14 MIN READ

Networking / Communications Jul 14, 2025

NCCL Deep Dive: Cross Data Center Communication and Network Topology Awareness

As the scale of AI training increases, a single data center (DC) is not sufficient to deliver the required computational power. Most recent approaches to... 9 MIN READ

Data Center / Cloud Feb 03, 2025

Just Released: CUTLASS 3.8

Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance... 1 MIN READ

Data Center / Cloud Jan 31, 2025

Just Released: NVIDIA cuDNN 9.7

Bringing support for NVIDIA Blackwell architecture across data center and GeForce products, NVIDIA cuDNN 9.7 delivers speedups of up to 84% for FP8 Flash... 1 MIN READ

Agentic AI / Generative AI May 24, 2024

Accelerating Transformers with NVIDIA cuDNN 9

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library for accelerating deep learning primitives with state-of-the-art performance.... 12 MIN READ