Ying Lin

Ying Lin is a research scientist at NVIDIA, where his work mainly focuses on enhancing pretraining data quality and generating synthetic data. Prior to joining NVIDIA, he worked on natural language understanding at Apple. He earned his PhD from the University of Illinois Urbana-Champaign.
Avatar photo

Posts by Ying Lin

Conversational AI

Announcing Nemotron-CC: A Trillion-Token English Language Dataset for LLM Pretraining

NVIDIA is excited to announce the release of Nemotron-CC, a 6.3-trillion-token English language Common Crawl dataset for pretraining highly accurate large... 4 MIN READ