Roman Anders

Roman Anders is a software engineer on the cuDNN team at NVIDIA, where he focuses on Flash Attention optimizations for inference and training workloads across current and next-generation GPU architectures. His contributions at NVIDIA span RNN, matrix multiplications, and convolutions. Previously, he served as an engineer on the Intel MKL team, where he developed Sparse BLAS, Direct Sparse Solvers, and FFT. He holds a master's degree in applied mathematics and programming from Novosibirsk State University in Russia.
Avatar photo

Posts by Roman Anders

Data Center / Cloud

Making Softmax More Efficient with NVIDIA Blackwell Ultra

LLM context lengths are exploding, and architectures are moving toward complex attention schemes like Multi-Head Latent Attention (MLA) and Grouped Query... 10 MIN READ