Posts by Alexander Zhurkevich
Data Center / Cloud
Feb 25, 2026
Making Softmax More Efficient with NVIDIA Blackwell Ultra
LLM context lengths are exploding, and architectures are moving toward complex attention schemes like Multi-Head Latent Attention (MLA) and Grouped Query...
10 MIN READ
MLOps
Dec 16, 2022
Simplifying and Accelerating Machine Learning Predictions in Apache Beam with NVIDIA TensorRT
Loading and preprocessing data for running machine learning models at scale often requires seamlessly stitching the data processing framework and inference...
11 MIN READ