PICO Classification – Using Spark NLP to improve PICO Element Identification

21.07.2022

Sai Shailesh

The proliferation of healthcare data has contributed to the widespread usage of the PICO paradigm for creating specific clinical questions from RCT. PICO is a mnemonic that stands for:

Population/problem: Addresses the characteristics of populations involved and the specific characteristics of the disease or disorder.
Intervention: Addresses the primary intervention (including treatments, procedures, or diagnostic tests) along with any risk factors.
Comparison: Compares the efficacy of any new interventions with the primary intervention.
Outcome: Measures the results of the intervention, including improvements or side effects.

PICO is an essential tool that aids evidence-based practitioners in creating precise clinical questions and searchable keywords to address those issues. It calls for a high level of technical competence and medical domain knowledge, but it’s also frequently very time-consuming.

Automatically identifying PICO elements from this large sea of data can be made easier with the aid of machine learning (ML) and natural language processing (NLP). This facilitates the development of precise research questions by evidence-based practitioners more quickly and precisely.

Empirical studies have shown that the use of PICO frames improves the specificity and conceptual clarity of clinical problems, elicits more information during pre-search reference interviews, leads to more complex search strategies, and yields more precise search results.

Information regarding the PICO classifier model in Spark NLP

Live demo of this model

Let’s have a look at how this model can accurately identify medical texts using the PICO framework as an example.

To get the word embeddings through BERT. We will use Spark NLP annotator called BertEmbeddings()

BertEmbeddings() annotator will take sentenceand token columns and populate Bert embeddings in bert column. In general, each token is translated into a 768-dimensional vector.

After that, we use SentenceEmbeddings which converts the results from WordEmbeddings, BertEmbeddings, or ElmoEmbeddings into a sentence or document embeddings by either summing up or averaging all the word embeddings in a sentence or a document (depending on the inputCols).

Now, let’s fit the pipeline and see the results.

Results:

Conclusion

We’ve discussed the significance of the PICO framework as well as how time-consuming, prone to human error, and requiring a lot of procedure and medical knowledge it can be. In addition, with the amount of pertinent health-care data being produced at an exponential rate, it has become more and more challenging to manually search for and identify PICO aspects. With far less human labour, practitioners can achieve good results by using Spark NLP to search the literature for PICO elements.

SparkNLP Resources

Sai Shailesh

Our additional expert:

👋 I am a Computer Science Senior who juggles his studies and life as a Data Scientist. ⚓ I am Currently working at John Snow Labs as a Data Scientist and maintaining/contributing to the SparkNLP library.