Allen Zhao

Allen Zhao (Wenyi Zhao) is a senior compute architect engineer specializing in cutting-edge AI compiler technologies, including both graph-level and tile-level compilation. His expertise lies in optimizing the execution efficiency of AI models across diverse hardware architectures, especially for GPGPU. He's passionate about translating theoretical compiler advancements into practical, high-impact solutions for the next generation of artificial intelligence. He holds a Master's degree from Shanghai Jiao Tong University.
Avatar photo

Posts by Allen Zhao

Decorative image.
Developer Tools & Techniques

Tuning Flash Attention for Peak Performance in NVIDIA CUDA Tile

In this post, we dive into one of the most critical workloads in modern AI: Flash Attention, where you’ll learn: How to implement Flash Attention using NVIDIA... 20 MIN READ