Watch Healthcare NLP Summit 2024. Watch now.
was successfully added to your cart.

John Snow Labs’ Data Market is now Live 

November 2018 marks a milestone if you are a healthcare data scientist who is hungry to create and innovate but don’t have the time to dig through the data clutter. Or, if you are a data journalist akin for better health by getting straightforward facts. 

 John Snow Labs is proud to announce the launch of our Data Market that is now live for the scientist in you looking for clean and normalized datasets. It brings to life over 1,800 datasets readily available and searchable for the healthcare & life science communities. A team of experts carefully selected, meticulously cleaned and persistently validated the files to ensure you only get the highest quality version of the data you need. 

If your organization uses Artificial intelligence (AI) for analytics, mobile health app development or solely for training models, we highly recommended that you visit and explore our Data Market hub and take a closer look at our catalog of healthcare, life sciences, and terminology data. Simply pulling them off by typing your keywords to automatically survey the entire database. Open data is free and downloadable while premium data packages are available via an annual subscription. 

We, at John Snow Labs, secure that the datasets are of top-quality as it passes three layers of quality monitoring from curating, normalizing, optimizing, enriching, and updating by a team of domain experts in their field. We’ve also raised the bar and are proud in the agility of our datasets as they easily synchronize into Hadoop, Spark, Python, R, SAS, SQL, and any major Business Intelligence (BI) tools. Our datasets come in CSV and Parquet formats and also include full metadata readable both by machines (JSON) and humans (PDF). 

A Tour of the Healthcare & Life Science Data Market



Medical Billing requires continuous successful processes from the providers and insurance companies, including Medicare ensuring uninterrupted services rendered to patients and beneficiaries. John Snow Labs medical datasets from various Clinical Classification Software such as the International Classification of Diseases (ICD)-10 Clinical Modification (CM) and the Diagnosis-related groups (DRGs) come very handy warranting rapid payment, efficient operations and finally benefitting patient safety. This data package facilitates clustering patient diagnoses, and procedures into a manageable number of clinically meaningful categories, grouping conditions and procedures without sorting through thousands of codes. analyzing and controlling costs, utilization, and outcomes for health plans, policymakers, and researchers. 



John Snow Labs Census data package contains rich datasets about socioeconomic factors like employment and unemployment, poverty levels, access to health insurance, race and ethnicity data, place of living statistics among many others. Since socioeconomic factors play a critical marker on population health whether in a group or in a general population, policymakers and decision-makers can examine the datasets that offer a basis for assessment, understand and plan the preventive activities, and healthcare resources, which help in comparing different population groups and their unique needs. 



John Snow Labs open data from Core category features datasets on Health, Climate, Demographics, Economy, Geography, Pharmaceuticals, Transportation and the Internet. Highlighted in this category are 26 datasets under Demographics data package containing statistical data on population and the specific sub-groups that belong to different countries, the majority coming from the United States. The Health data package provides a combination of different health-related datasets ranging from vaccination, causes of death and death rate, diabetes data, water consumption, and water disposal and many more. 



Cost category involves various complex datasets mainly from US Medicare and Medicaid programs that impact healthcare service fees depending on several factors like location of the provider, population demographics, service request types, financial capacity, access to services and the overall view of the service market. John Snow Labs empower the decision makers in understanding the entirety of the US healthcare cost and expenditure trends, and the factors that influence fluctuations in their pricing. Few interesting datasets here include BSA Durable Medical Equipment Line Items PUF, Chronic Conditions Public Use File and Medicare Cost Adults with Utilization and Quality Indicators by State. 



This data package contains clinical practice Guidelines, which are recommended for clinicians on the care of patients with specific health needs. Anticoagulation Therapy Outcomes is a collection of numerous studies related to anticoagulation therapy. The NGC Guideline Summaries data package, on the other hand, contains summaries of evidence-based practice intended to optimize patient care using systematic review of the benefits and risks of alternative care options. MEDLINE PubMed Data consists of citations and abstracts from the National Library of Medicine (NLM) in the fields of medicine, nursing, dentistry, veterinary medicine, and other health care systems and preclinical sciences. 


Hospital database

Quality care, other than cost, is an important avenue considered by patients when choosing for their healthcare needs. Hospitals data package addresses that need by providing the most up-to-date listing of physicians and their sub-specialties, highly skilled nursing staff, various medical technicians, health care administrators, a wide range of services and specialized medical equipment provided by facilities. Various datasets discuss the timely and effective care of providers, show the percentage of hospital patients who got the best treatment results, and reveal serious medical conditions or surgical procedures. Hospital Compare data sets allow consumers to select multiple hospitals and directly compare performance measure information as it includes measures on mortality, the safety of care, readmissions, patients experience, timeliness of care, the effectiveness of care and efficient use of medical imaging. 



Measures category contain information on different metrics or standards of measurements for diverse range quality measures such as air quality, Austin airport data, LBB performance report, school survey, child poverty, system international units, weight measures; Accountable Care Organizations ACO Quality Performance Measures and Standard Measures and Physician Quality Reporting System (PQRS), Performance Rates for Individual EP PQRS, CAHPS and Group Practice. Off the charts, this category is functional for providers, patients, consumers, and healthcare institutions alike. 



Outcomes data package contains distinct datasets related to Behavioral Risk Factor Health and Diseases Surveillance System, Health and Disease Registries and Surgical Outcomes that are all linked to either medical or surgical care outcomes. This data package will come very handy for doctors, medical health planners, psychologists, medical research groups and behavioral therapists. 



The Centers for Medicare and Medicaid Services (CMS) and the Social Security Administration (SSA) are the major sources of data for this data package. The datasets showcase the healthcare services delivered to the eligible beneficiaries and the Payments made by the insurance organizations both from the public and private sectors. A key dataset in this category is the SSA Fast Track Process Public Use Files by State and Region that facilitates comparison of expenditures by area to be used as a benchmark for future health programs. 



John Snow Labs data package on Physicians delivers various lists of US-trained medical doctors including those who received international medical preparation for diverse types of sub-specialties. This data package also includes datasets on physician utilization and payment data, healthcare information and Medicare cost and budget.  


Population Health

John Snow Labs prides itself with its robust Population Health (aka PopHealth) datasets; this concept dates back to 2003 when “the health outcome of a group of individuals, including the distribution of such outcomes within the group” is emphasized. Datasets deal with the distribution of health outcomes within a population, giving consideration to personal, social, economic, and environmental factors that influence such outcomes, not ignoring the policies and interventions that affect these factors too. The demographics of the datasets include mortality, life expectancy, inequality measures, measures of morbidity, health surveys and health indicators and interventions to address global health inequities. Public health officials and program managers seeking potential interventions, setting priority areas for action and evaluating progress; policymakers developing policies; clinicians, and researchers seeking health improvement and geographic disparities reduction; all public health practitioners will greatly advance their efforts using this data package.  



Providers, other than the medical doctors in the US, encompass the individual personnel, health facilities, and medical products. The datasets in this data package contain different types of provider information from utilization, payment, nursing, and dialysis facilities. Besides the government and private health care facilities, there are also 355 registered free clinics in the United States that are considered to be part of the social safety net for those who lack health insurance. Comparison between home and hospice services are also showcased in this catalog. 



Terminology category is a mix of some of the most important clinician-and-patient friendly medical terminologies and their databases that can be shared consistently within and across healthcare settings. John Snow Labs is dedicated to providing quality datasets have put together this data package that can be organized, queried, and analyzed.  SNOMED Clinical Terms (CT) is a systematically organized computer readable collection of medical terms providing codes, terms, synonyms, and definitions used in clinical documentation and reporting, and is considered to be the most comprehensive, multilingual clinical healthcare terminology in the world. LOINC, another medical terminology database is a rich catalog of measurements, clinical measures and standardized survey instruments, and more. LOINC enables the exchange and aggregation of clinical results for care delivery, outcomes management, and research by providing a set of universal codes and structured names to unambiguously identify things that can be measured or observed. RxNorm, another Terminology provides normalized names for clinical drugs that are linked to many drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, MediSpan, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. 



Clinical Trials

Clinical Trial category contains Clinical Trial database from Registry and Results Datasets on various clinical trials conducted worldwide. It compiles information on publicly and privately supported clinical trial studies on a wide range of diseases and conditions. These datasets are useful for further research as a source of primary data or for secondary analyses as well as plans for future trials and public health strategies. This category is recommended for patients, family members, healthcare professionals, researchers, and the general public for an easy access to both privately and publicly funded clinical trials information. Highlighted in the Clinical Trials category are the datasets on the International Stroke Trial (IST) conducted on individual patients with acute stroke and the replication data for the absence of an association between cord specific antibody levels and severe respiratory syncytial virus (RSV) disease in early infants from a case-control study from coastal Kenya.  



Medical Devices range from simple tongue depressors to complex programmable pacemakers with micro-chip technology; both these traditional and innovative tools are extremely important pillars in healthcare. John Snow Labs has packed under Devices category updated lists of approved medical devices by the Food and Drug Administration (FDA) under different procedures and programs. The category also includes information on owner’s facilities that are involved in the production and distribution of these medical devices. Adverse events and other intriguing data related to the use of US medical devices are spread over more than 40 datasets like FDA Medical Device 510k Clearances, Manufacturer and User Facility Device Experience Database and Medical Device Establishment Contact Addresses. 


Drug Pricing

In addition to quality, safety, and efficacy, Drug Pricing is consequential in drug marketing and the entire healthcare system. John Snow Labs delivers a Drug Pricing database allowing the inquisitive pro and the curious learner access to updated drug prices at different levels from National Drug Code to Healthcare Common Procedure Coding System Crosswalks. These datasets come with conversion factors and dates changes, list of pharmacies and licensed pharmacists in the US. Featured datasets from Medicare and other official agencies include National Drug Code HCPCS Crosswalk, Medicare Prescription Drugs Claims and CMS Drug Utilization Review. 


Drug Safety

Drug Safety is one of the most critical disciplines in daily medical practice from approving new medications to withdrawing drugs from the market. John Snow Labs put together this catalog to provide easy access to the latest news on drug safety information. Information on which drug has the superior therapeutic effect and which medications hold the greatest risk for a certain condition can be retrieved from the FDA Drug Adverse Events Reporting System (FAERS) data package. Similar information on the safety net for vaccines is also accessible from Vaccine Adverse Event Reporting System (VAERS) catalog. Other interesting Drus Safety databases can also be found in John Snow Labs Orange Book Approved and Purple Book Licensed Drug Products catalog. 



John Snow Labs Food related data packages consist of all the nutrition-related information for the general public and professionals alike; an example is the Food Nutrition Value dataset which answers the question “what’s in your food?”. This dataset supplies the number of calories, fat, protein, vitamins, and minerals contained in most food items. Other interesting datasets include the dietary supplement ingredient database, adverse food events, nutrition assistance program and international food consumption database and a few datasets on restaurant inspection scores. This data package plays an integral role in maintaining balanced nutrition and a healthy lifestyle thereby reducing the risk of chronic and life-threatening diseases like hypertension, diabetes, and cancer. 



Genomics catalog is a concentration of datasets that talk about the discipline of genetics applying recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism). This data package is a rich source of data on Gene Products and Targets including antigen-antibody responses, microarray protein expression, endogenous ligands and peptides, ligand molecules and interaction data for targets and ligands. Genetic associations, on the other hand, offers biochemical protein-protein interaction, genetic variation, gene chemical interaction, and protein kinase interactome as well as the Human Gene Expression Database that contains expression profiles for proteins in normal and cancer tissues. This data package is recommended for researchers and students in pharmacology and drug discovery and for the general public to provide them with accurate information on the basic science underlying drug use and mechanism of action. 

John Snow Labs' Data Market: Data Procedures and Data Quality

Introduction In today’s world, given the exponential growth of data and its intensive use for analysis, pattern discovery or decision making, data...