TJ Xu

TJ Xu is a software engineer at NVIDIA working on horizontal scaling optimizations in XLA.
Avatar photo

Posts by TJ Xu

Data Center / Cloud

Optimizing for Low-Latency Communication in Inference Workloads with JAX and XLA

Running inference with large language models (LLMs) in production requires meeting stringent latency constraints. A critical stage in the process is LLM decode,... 6 MIN READ