Kunlun Li

Kunlun Li is an AI developer and technology engineer at NVIDIA, specializing in CUDA programming and performance optimization of LLM training. He has contributed to key features like FP8 training, context parallelism, kernel optimization in Megatron-Core and Transformer-Engine.

Posts by Kunlun Li

Agentic AI / Generative AI Jan 28, 2026

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It... 12 MIN READ