was successfully added to your cart.

Training a Distributed NER model for drug, disease and condition identification

The recognition of drugs, diseases, and conditions from electronic medical records is a very important subtask in information extraction in clinical research and the healthcare domain at large.

A solution to this task is to use Named Entity Recognition (NER) which identifies and extracts named entities in unstructured text from pre-defined categories. BERT (Bidirectional Encoder Representations from Transformers) is a general-purpose language model.

There are variations of BERT pretrained on Scientific and Biomedical data for NER in the healthcare domain. These models perform well on static data but start to fail in a commercial setting where there is a continuous domain shift between inference and training data over time. One solution for this problem is to retrain the model on a set schedule but this is both time and compute-intensive.

As a solution, Mukesh will present a NER model that does distribute training in production and the challenges in building such a model.

Language based Pre-training for Drug Discovery

Pretraining has taken the NLP world by storm as ever larger language models have broken successive benchmarks. In this talk, I’ll review...