Training a Distributed NER model for drug, disease and condition identification

The recognition of drugs, diseases, and conditions from electronic medical records is a very important subtask in information extraction in clinical research and the healthcare domain at large.

A solution to this task is to use Named Entity Recognition (NER) which identifies and extracts named entities in unstructured text from pre-defined categories. BERT (Bidirectional Encoder Representations from Transformers) is a general-purpose language model.

There are variations of BERT pretrained on Scientific and Biomedical data for NER in the healthcare domain. These models perform well on static data but start to fail in a commercial setting where there is a continuous domain shift between inference and training data over time. One solution for this problem is to retrain the model on a set schedule but this is both time and compute-intensive.

As a solution, Mukesh will present a NER model that does distribute training in production and the challenges in building such a model.

About the speaker
Amy-Heineike

Mukesh Mithrakumar

Sr. Machine Learning Engineer at IQVIA

Mukesh Mithrakumar is a Senior Machine Learning Engineer at IQVIA.

He works within the ADA Team to build a healthcare-focused machine learning platform for IQVIA and its global partners.

He primarily focuses on use cases that require Natural Language Processing or Recommendation System based solutions.

Before IQVIA he founded AD AI Solutions, a consulting firm assisting early-stage startups with integrating machine learning into their products.