Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

Medical Imaging and Big Data – Part I: A Prime to Medical Imaging

This blog introduces a primer to medical imaging concepts, common terminologies and the medical imaging efficiency core measures used to assess the process of medical imaging in hospitals. More related topics will be discussed in a coming blog the next week, like the challenges that medical imaging could be facing due to the rapid development in Big Data technologies and imaging techniques.

1. MEDICAL IMAGING and the 4 V’S

When it comes to medical imaging, your body can be considered a source for big data. Medical data stored by hospitals is growing incrementally due to the data stored in Electronic Health Records (EHRs) files. Most of these data are coming from medical imaging data.

The medical imaging data satisfies the 4 V’s rule (the volume, variety, velocity, and veracity), therefore we can deal with it as big data.

It is expected that 80% of patient data will become unstructured data coming from medical imaging unstructured data by 2015.

1.1 Different Modalities of Medical Imaging

There are different types of medical images. The most common types are:

1. Plain X-ray

Figure 1: Knee Plain X-ray.

2. Computerized Tomography (CT)

Figure 2: Computerized Tomography (CT) of the human brain (from the base of the skull to top).

3. Magnetic Resonance Imaging (MRI)

Figure 3: Magnetic Resonance Imaging (MRI) of a Hunter Syndrome Brain.

4. Cone Beam Computerized Tomography (CBCT)

Figure 4: Cone Beam Computed Tomography (CBCT).

5. Positron Emission Tomography (PET)

Figure 5: Positron Emission Tomograph (PET) glucose metabolism in the brain of an Alzheimer’s disease.

Medical imaging data is different than any other type of data as it is more complex and implies a great need for accuracy. Moreover, medical imaging needs a bigger volume of storage repositories than traditional data.

1.2 Publicly Available Medical Image Repositories

To be able to estimate the size that could be occupied by a medical imaging database, you can check any of the publicly available medical image repositories:

  1. Cancer Imaging Archive Database (244,527 images occupying 241 GB)
  2. Public Lung Image Database (28,227 images occupying 28 GB)
  3. MS Lesion Segmentation (145 images occupying 36 GB)


2.1 Picture Archival & Communication Systems (PACS)

PACS is a system used for storage and retrieval of the medical images. Using PACS, you can store both 2D and 3D images.

Radiologists use PACS to store diagnostic imaging files, where another member of the healthcare team can search and retrieve the medical image file (whether it is a CT, MRI, plain x-ray, or any other modality). Images are stored either on a local server or using a cloud computing platform.

PACS allow us to keep record and monitor the patient case by interpreting all the images stored for the case before the treatment/surgery and compare it to the images taken after the intervention.

2.2 Radiology Information System (RIS)

It is a software application where you can keep the patients’ reports or merge more than one report concerning the same patient. So, through RIS you can simply track the prognosis of the patient. You can also keep track of the patients’ appointments and how the follow-up visits are going.

2.3 Digital Imaging and Communications in Medicine (DICOM)

It is the standard used for integrating different medical imaging systems and the machines used in capturing the images (i.e.: transfer and management of medical imaging information and related data). The medical staff can then compare different images for the same case (before and after intervention) easier.

DICOM has been expanded to be beneficial in other specialties (like DICOM for pathology).

3. Medical Imaging and Evidence-Based Medicine

3.1 Medical Imaging and Healthcare Research

Imaging study is the representation of the content produced in a Digital Imaging and Communications in Medicine (DICOM) imaging study.

A study includes a set of series, each one includes a set of Service-Object Pair Instances (SOP Instances – images or other data) acquired or produced in a common context.

A series contains only one modality (e.g. X-ray, CT, MR, ultrasound), but a study may have multiple series of different modalities.

John Snow Labs database repository includes an important database that could be a good example for Imaging Study.

The repository contains another dataset that shows how medical imaging can provide the clinician with the best and most recent clinical decision. Treatment Studies for Advanced Imaging for Elderly Hip Fractures dataset provides strong evidence that supports the high applicability of advanced imaging for some outcomes in the management of hip fracture in the elderly.

3.2 Medical Imaging Efficiency measures

Some healthcare facilities do not use medical imaging appropriately. Patient exposure should be kept to the minimum due to the nature of the side effects of radiation, contrast materials or due to the high cost of other investigations (like MRI or CT).

Big data analytics allowed decision makers and governmental sectors to monitor the efficiency of the use of medical imaging.

The Centers for Medicare & Medicaid Services (CMS) receives the claims from different facilities and hospitals which treat Medicare patients. CMS analyze them to assess the use of medical imaging.

Again John Snow Labs database repository can be considered a good resource where you can obtain data about the Outpatient Imaging Efficiency Core Measures by Hospital or by state.

Both datasets include the hospital data for the Outpatient Imaging Efficiency Core Measures. These Core Measures gives information about the hospitals’ use of medical imaging tests for outpatients. Examples of medical imaging tests include CT Scans, MRIs, and mammograms.

Evidence-Based Medicine (EBM) and Data Science - Part 2: Mining the PubMed Database

The last blog (A Primer to EBM – Part [A]) introduced in brief the well-known healthcare research databases, their structure, and the...