GTC-DC 2019: NV-Bio-BERT: Building Biomedical Domain-Specific Natural Language Understanding with BERT
Hoo Chang Shin, NVIDIA
We’ll describe the future of contextual word embedding models like Bidirectional Encoder Representations from Transformers (BERT), which demonstrate improved performance in a wide range of natural language processing (NLP) tasks. But it’s not easy to apply BERT to domain-specific corpora like biomedical texts. There are several pre-trained models, such as BioBERT, SciBERT, and ClinicalBERT, and though they show improved biomedical text mining, they don’t advance in general domain NLP tasks. We’ll show how to use BERT to make a biomedical domain-specific language model, and how to accelerate the iterative hypothesis testing on multi-node GPUs.