Watch Healthcare NLP Summit 2024. Watch now.
was successfully added to your cart.

Medical Screening Challenges in the era of Big Data

Medical screening in healthcare means detecting early asymptomatic disease or the precursors of disease.

In other words, applying a test to detect a potential disease in people who show no signs or symptoms of that disease or condition.

Currently, while the world is suffering from the COVID-19 pandemic, it is important to be precise in choosing every term and let the reader understand the real meaning for each terminology.

It is important here to clarify that the term “the patient shows no known symptoms” is not equivalent to “the patient is asymptomatic” because the patients might have symptoms that they are not aware of.  Screening tests are not exclusive for diseases but for detecting risk factors also.

An accurate and reliable screening test is extremely critical and effective in the overall healthcare service quality because it can allow for early detection of diseases, early diagnosis, and early intervention.

The identification of a disease, condition, or a risk factor can be achieved by one or more of the following:

– Physician examination (e.g.: measuring blood pressure).

– Medical Imaging (e.g.: Plain X-ray, MRI, CT Scan, … etc.).

– Laboratory investigations (e.g.: measuring blood glucose level).

– Procedure (e.g.: Sigmoidoscopy).

If you have many screening tests datasets, and you hesitate which one to choose for your research project or to use as a training dataset, you should know first what are the features of a good screening test.


Features of a Good Screening Test

– The disease or the condition should be relatively common in the community.

– The test should address a serious health problem (high mortality and/or morbidity).

– The test should be performed in an early pre-clinical stage.

– Early detection and intervention should be known to improve the outcome while delayed intervention is known to harm the case.

– Availability of a drug/treatment for the condition/disease of interest.

– Availability and accessibility for healthcare facilities or points of care for the detected cases.

– The test should be safe, simple, quick, cheap, and can be performed by nurses or paramedics.

– The test should be “sensitive”, “specific”, “accurate”, and “valid”.

Here, the reader should understand the meanings and differences between different terms, namely: sensitivity, specificity, accuracy, validity, reliability, and predictive value.

“Sensitivity” means classifying the diseased person as likely to have the condition, while “Specificity” means classifying a non-diseased person as unlikely to have the condition.

In other words, sensitivity is the ability of the test to identity all true cases while specificity is the ability of the test to give positive results with cases only (exclude true negatives).

“Overall Validity” means the ability of the test to identify all true cases and exclude true negative cases.

“Predictive Value (PV)” is subdivided into Positive Predictive Value (PPV) and Negative Predictive Value (NPV).

“Reliability” is the extent to which the screening test will produce the same results each time it is administered.  Reliability should be achieved even if there are different sources of variability like the biological variation, the instruments used in the test, or due to factors related to the screener’s measurements or his/her methods for applying the screen test.

PPV measures to what extent a positive screening test predicts true cases, while NPV measures to what extent a negative screening test predicts well persons.

– If it is important not to miss a disease (like in cases of cancer), it is better to choose a test of greater sensitivity, while in cases where positive diagnosis might cause much worry or expenses (like in cases of HIV), it is better to choose tests of greater specificity.


Types of Screening

1- Mass Screening:

It is suitable for large population groups, that vary in their risk of disease and where everyone is screened.

2- Selective (Targeted) Screening:

It is suitable for groups of a high risk of a disease or a condition.  May include relatives of patients. It is expected to detect more potential cases of a given disease (e.g.: relatives of people with genetic diseases, occupational groups, and people with known precancerous states).

3- Population Screening:

It is suitable for screening of low-risk people, where everyone is screened. It should not be expensive.  (e.g.: Breast Cancer Screening, Cervical Cancer Screening, and Neonatal Thyroid Screening).


Pros and Cons of Screening Using Big Data

Screening based on recent advances in handling big data was beyond the great progress in the field of Precision Medicine, which accordingly led to the prevention of many potential diseases, in addition to the detection of risk factors or discrepancies that might not seem to be relevant to the disease.

On the other hand, some studies [1] reported that enhancements resulted from using big data screening in preventive precision medicine may be hindered by the problem of “Overdiagnosis”.

To eliminate the problem of overdiagnosis, prognostic studies should entail long follow-up (maybe a decade or more) to better investigate and quantify overdiagnosis.


Samples from Screening Datasets

John Snow Labs catalog includes 3 important datasets that could show the importance of good data wrangling for a reliable screening test results and for the future of precision medicine.  The reader can download a free sample for any of them.

The first dataset presents congenital disease screening in newborns in California.  The count is represented by region and by the State of California. Data is from the California Health and Human Services Open Data Portal.

The second dataset is sourced from Public Health England and consists of the percentage of people in the resident population eligible for cervical screening who were screened adequately within the previous years (2010 to 2016) for bowel, cervical, and breast cancer.

The third dataset is composed of responses from 858 patients and 36 variables focusing on the prediction of indicators or diagnosis of cervical cancer. The dataset provides demographic information, habits, and historic medical records of the 858 patients from Hospital Universitario de Caracas in Caracas, Venezuela.


[1] Vogt H, Green S, Ekstrøm CT, Brodersen J. How precision medicine and screening with big data could increase overdiagnosis. BMJ. 2019;366(September).

How to Cast your Healthcare Data into One Flat File Compatible with openEHR Archetypes

What is openEHR? 'openEHR' is just an eHealth application.  It offers open specifications and clinical models taking into consideration the concepts of...