Zhaoyuan He

Zhaoyuan He is a senior deep learning software engineer on the NVIDIA TensorRT team, specializing in efficient GPU inference for large language models. His technical interests span the performance optimization techniques that power modern inference frameworks, including kernel development, graph optimization, runtime execution, quantization, and distributed inference with collective communication optimizations. He works on advancing these techniques to deliver higher throughput and lower latency for end-to-end LLM serving on NVIDIA platforms. Zhaoyuan holds a Ph.D. in computer science from The University of Texas at Austin and an M.S. in electrical and computer engineering from the University of California, San Diego.
Avatar photo

Posts by Zhaoyuan He

Decorative image.
Developer Tools & Techniques

Scaling AI Inference Across Multiple GPUs Using NVIDIA TensorRT with Multi-Device Inference Support

Generative AI workloads are rapidly outgrowing the memory and compute budget of single GPUs. For inference developers building media generation pipelines, the... 11 MIN READ