Shizhe Diao

Shizhe Diao is a research scientist at NVIDIA Research and is working on the research in efficient training and alignment of foundation models. He completed his PhD at the Hong Kong University of Science and Technology. Shizhe has seven years of experience in machine learning and natural language processing, and is the first author of the popular post-training project LMFlow.

Posts by Shizhe Diao

Agentic AI / Generative AI Dec 01, 2025

Train Small Orchestration Agents to Solve Big Problems

Using the right tool and model for a task is a challenging and ever-present engineering problem in agent design. At NVIDIA Research, we're making fast progress... 7 MIN READ

Agentic AI / Generative AI Nov 19, 2025

Breaking Through Reinforcement Learning Training Limits with Scaling Rollouts in BroRL

When training large language models (LLMs) with reinforcement learning from verifiable rewards (RLVR), one of the most compelling questions is how to overcome... 7 MIN READ

Agentic AI / Generative AI Aug 13, 2025

Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2

Currently, one of the most compelling questions in AI is whether large language models (LLMs) can continue to improve through sustained reinforcement learning... 8 MIN READ

Agentic AI / Generative AI Nov 22, 2024

Hymba Hybrid-Head Architecture Boosts Small Language Model Performance

Transformers, with their attention-based architecture, have become the dominant choice for language models (LMs) due to their strong performance,... 12 MIN READ