Big Data Role in Cardiology

01.12.2020

Mohamed Tharwat

The Big Data role in healthcare cannot be denied. Covid-19 issues this year (2020) has declared drastically that big data became a mandatory companion to any healthcare specialty. One of the most important healthcare specialties that big data played an important role in is Cardiology.

Insights driven from records populated from Electronic Health Records (EHRs) have participated markedly in the enhancement of cardiovascular diseases (CVD) in different research fields. The following are some CVD research fields that big data played an important role in:

– Imaging: Cardiac Magnetic Resonance Imaging (MRI).

– Variable data types integration: Cardiac MRI imaging data, genetic data, and biomarker data.

– Mobiles and Smartwatches applications: recording and identification of cardiac arrhythmias are now possible through smartwatches applications.

– Classification of cases of heart failure: that was done through specific machine learning algorithms.

– Drug surveillance: Adverse drug events are recorded and analyzed after being available in the markets.

Big data has imposed a need for the modification of the conventional research methodologies. The emergence of healthcare research networks with modern techniques facilitated the integration of different data types from different locations on the regional and international levels. Not only this allowed better data processing and analysis but also allowed better monitoring and evaluation for clinical applications.

Data Analytics

The main aim of healthcare research is to accept or reject any null hypothesis. The Null hypothesis assumes that there is no difference between the specific characteristics of a population.

One test could be enough to accept or reject the null hypothesis in case of small to medium data size.

On the other hand, big data may require more than one test and may need other measures determination (like determining the correlation between the characteristics understudy).

Big data analytics entails different methodologies (logistic regression, principal component analysis, Bayesian analysis, decision trees, and neural networks).

That is why there is a global shortage of big data experts needed. We have an enormous amount of data in many fields that need experts who are experts in the domain field (healthcare for example) and at the same time experts in big data analytical tools with a sound background in statistics, programming, SQL, and NoSQL database management systems. Such calibers are exceedingly rare and very precious to find. This is why data science is the sexiest job in the 21^st century.

Big Data and High-Performance Computing (HPC)

The union of Big Data and HPC is expected to extend the role of genomic analysis in the future to provide better decision-making.

The role of genomic analysis was prominent in finding an effective therapy for the mutation of the PCSK9 gene as a predisposing factor for familial hypercholesterolemia.

Such drugs are called “inductive drugs”. They target a specific gene that has been proved to be a cause in the development of a certain disease. Identifying the patients who carry the gene, tracing the disease progression, and the effect of the drug on the targeted gene is achieved through big data analytics populated for the Electronic Health Records (EHRs) and the genomic databases.

Hadoop is an open-source system (discussed in a previous blog) that appears to be an ideal solution for data storage and as a data analysis platform in such projects. It uses parallel processing and fault-tolerant distributed processing techniques.

Google uses Map Reduce to support parallel computing. This can allow the processing of huge chunks of data distributed on groups of computer clusters. This method can permit access to data streams with no need to download the data.

Hadoop clusters are customized and provided with huge storage volumes. They are machine-independent. Moreover, data analytics can act as a monitoring tool where alerts could be configured for radiation doses, patient throughput, and procedural times in the catheterization laboratory.

Data Sources

Conventional interventional cardiology is well-known for using data-centric governance models. Andreas Gruentzig used to collect data from diversely allocated research centers to enhance the quality of the data insights regarding his research in the field of coronary angioplasty.

The American College of Cardiology cardiac catheterization and angioplasty initiated its registry in 1994. That was a preliminary step after which the CathPCI registry of the National Cardiovascular Data Registry (NCDR) started its duties 25 years later taking the charge of 90% of procedural data in the United States.

These data provide descriptive analytics as a measure of the quality enhancement process with regards to the procedures and outcomes in many healthcare organizations.

The quality of data is an extremely critical condition for providing such accurate analytics. In many cases, data integration is a problem in huge research projects where data is collected from different entry points as data formats could be available in different formats, missing or inaccurate.

John Snow Labs can be an excellent source for researchers working in the cardiology field. The company provided several datasets in this field that could provide important insights and show new facts if digested and analyzed by suitable field experts. Having curated and standardized data can make the researchers’ jobs much easier and save them a lot of time.

The following are samples from the available datasets related to the cardiology field:

– John Snow Labs catalog offers a dataset that provides information about Coronary Heart Disease Deaths. It contains information on the number of coronary heart disease-related deaths (ICD-10 codes I20-I25), where the numerator is the number of coronary heart disease-related deaths and the denominator is the number of persons. Data includes death due to ischemic heart diseases (acute myocardial infarction, other acute ischemic heart diseases, and other forms of chronic ischemic heart disease).

– The reader can find another dataset that contains details on the measure “Adherence To Statin Therapy With Coronary Artery Disease NQF 0543”. National Quality Forum (NQF) 0543 is the percentage of individuals with Coronary Artery Disease (CAD) who are prescribed statin therapy that had a Proportion of Days Covered (PDC) for oral statin medications of at least 0.8 during the measurement year 2011.

– Another dataset that may be interesting to cardiologists in the one containing information on gives information on Coronary Angiography rates of Medicare beneficiaries at Hospital Referral Regions (HRR) for the year 2012. Hospitalization rates represent the counts of the number of discharges that occurred in a definitive time (the numerator) for a specific population (the denominator).

By utilizing advanced tools such as Generative AI in Healthcare, cardiologists can enhance decision-making processes, driving more accurate diagnoses and treatment plans. Additionally, the integration of Healthcare Chatbot technology provides valuable support by offering real-time patient interaction, helping manage heart disease care efficiently and improving overall patient outcomes.

Mohamed Tharwat

Our additional expert:

Mohamed joined John Snow Labs (JSL) in Feb. 2016 as Healthcare Researcher and Author. Other than having 20+ years of experience moving between different healthcare domains (management, training, curricula design, solution architecture, clinical, research, and data management), Mohamed has good experience in working with SQL, big data, machine learning, and Python. Before joining JSL, Mohamed had worked as a Healthcare Facility Manager in his own private practice. He has also worked as a data manager, training consultant, and eHealth Researcher in various companies/organizations in Egypt, Canada, and US (Remotely).