Posts by Sharath Sreenivas
Conversational AI / NLP
Dec 05, 2019
Pretraining BERT with Layer-wise Adaptive Learning Rates
Training with larger batches is a straightforward way to scale training of deep neural networks to larger numbers of accelerators and reduce the training time....
10 MIN READ