Dan Su

Dan Su is a research scientist at NVIDIA. Her current research is focused on large language model pretraining. She received her PhD in NLP from the Hong Kong University of Science and Technology.
Avatar photo

Posts by Dan Su

Conversational AI

Announcing Nemotron-CC: A Trillion-Token English Language Dataset for LLM Pretraining

NVIDIA is excited to announce the release of Nemotron-CC, a 6.3-trillion-token English language Common Crawl dataset for pretraining highly accurate large... 4 MIN READ