Mostofa Patwary

Mostofa Patwary is a senior deep learning research scientist at the Applied Deep Learning Research team at NVIDIA. Mostofa's research interests span in the areas of natural language processing, scalable deep learning, HPC, and algorithm engineering. Prior to joining NVIDIA, Mostofa worked on scaling large language models and the predictability of scaling deep learning applications at Baidu's Silicon Valley AI Lab. Mostofa also made significant contributions in developing large-scale code for several core kernels in machine learning capable of running on supercomputers.

Posts by Mostofa Patwary

AI / Deep Learning

Scaling Language Model Training to a Trillion Parameters Using Megatron

Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. 17 MIN READ
AI / Deep Learning

Adding External Knowledge and Controllability to Language Models with Megatron-CNTRL

Large language models such as Megatron and GPT-3 are transforming AI. We are excited about applications that can take advantage of these models to create better… 8 MIN READ
AI / Deep Learning

State-of-the-Art Language Modeling Using Megatron on the NVIDIA A100 GPU

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as… 9 MIN READ