Kunlun Li

Kunlun Li is an AI developer and technology engineer at NVIDIA, specializing in CUDA programming and performance optimization of LLM training. He has contributed to key features like FP8 training, context parallelism, kernel optimization in Megatron-Core and Transformer-Engine.
Avatar photo

Posts by Kunlun Li

A decorative image.
Agentic AI / Generative AI

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It... 12 MIN READ