Matthew Nicely

Matthew Nicely joined NVIDIA in March 2019, having previously worked at the U.S. Army Aviation and Missile Research Development and Engineering Center, Huntsville, AL, USA. There, he focused on CUDA algorithm development and optimizations on the Jetson series. At NVIDIA, he has worked in the Federal segment assisting with CUDA development and optimizations, along with education and proof of concepts for customers on various NVIDIA tool sets, before recently transitioning to math libraries product manager. In 2019, he received his Ph.D. degree in computer engineering, focusing on algorithm optimizations on GPUs.

Posts by Matthew Nicely

News 0

Programming Distributed Multi-GPU Tensor Operations with cuTENSOR v1.4

NVIDIA cuTENSOR library, v1.4, supports 64-dimensional tensors, distributed multi-GPU tensor ops, and improves tensor contraction performance models. 2 MIN READ
News 1

Implementing High Performance Matrix Multiplication Using CUTLASS v2.8

High performance CUTLASS template abstractions support matrix multiply operations (GEMM), Convolution AI, and improved Strided-DGrad. 2 MIN READ
News 0

Accelerating ReLu and GeLu Activation Functions, and Batched Sparse GEMM in cuSPARSELt v0.2.0

NVIDIA cuSPARSELt v0.2 now supports ReLu and GeLu activation functions, bias vector, and batched Sparse GEMM. 2 MIN READ
News 0

Using Fully Redesigned Batch API and Performance Optimizations in nvCOMP v2.1.0

The new nvCOMP v2.1.0 library features redesigned batch API and performance optimizations. 2 MIN READ
News 0

cuSOLVERMp v0.0.1 Now Available: Through Early Access

cuSOLVERMp provides a distributed-memory multi-node and multi-GPU solution for solving systems of linear equations at scale! In the future, it will also solve eigenvalue and singular value problems. < 1
News 0

nvCOMP v2.0.0 Now Available: With New Compressors

nvCOMP is a CUDA library that features generic compression interfaces to enable developers to use high-performance GPU compressors in their applications. < 1