Watch Healthcare NLP Summit 2024. Watch now.
was successfully added to your cart.

122 NEW CORE DATASETS  Are Now Freely Available in Partnership With The Open Knowledge Foundation

This month is another milestone for John Snow Labs as we forge our partnership with Open Knowledge International, “a global non-profit organization focused on realizing open data’s value to society by helping civil society groups access and use data to take action on social problems”. This vision is strongly in line with John Snow Labs’ mission to provide access to fresh and clean data – now not only to data scientists, but to everyone who has a need and meaningful use for it.

In this month’s release, John Snow Labs’ curated 145 datasets and from this number there are 122 from the Core category. Data content showcases on economy, geography, climate, Internet, health and pharma. These are the core components of every domain that talks about life, health, environment, politics and business, which will help shape every area that matters to human life. These data can be used for NLP in financial services and in other fields.

Updating the Master Provider Index is a ten-minute process with John Snow Labs. We’ve dramatically accelerated a process that used to be two days of work.” Claudiu Branzan, Principal Software Engineer Atigeo.

Many people told me the datasets were great and very easy to use. We would love to continue partnering with you for future events!” Jason Yim, HopHacks Organizer

The datasets were really clean, easy to access and easy to use. It was a joy to be able to use the data provided.” Eric Rothman, Co-Founder Threat Sync



The Climate Category has 15 new datasets that talk about CO2 Emission Estimates and Euro Car Emissions, Absolute Sea Level Changes, Cumulative Mass Balance of Reference Glaciers, Atmospheric CO2 Trends, Annual and Monthly CO2 Marine Surface, European Union Emissions Trading System and the NASA GISS Surface Temperature Analysis. All these datasets have a huge potential to help Non-Government Organizations, students and the government agencies that focus on saving and preserving planet earth.


On the other hand, Economy has 40 new datasets that explored Incoterms Functions and Codes, UK Spending, Public Expenditures, Properties and Consumer Prices and Inflation Indexes, GDPs, Gold Prices, GINI Indexes, GHEITI Data, UK Bond Yields, Euribor Rates, IMF Database and Granularity, and even Corruption Perceptions Index. There are also data values on Belgium COFOG Nomenclature, UK COFOG Nomenclature, NASDAQ Exchange, The New York Stock Exchange, and other Exchange Listed Securities Companies that play a big role in the stocks market. There are also data on Standard and Poors 500 Index Earnings and PE Ratio and the list of Companies with important financial information for each company.


This is is another topic that will be good to use in collaboration with organizations that have advocacies on earth preservation, climate change, good governance and other meaningful projects that will benefit countries around the globe. This category has 18 new datasets with information on Geonames and IDs, IMO IMDG Classification Codes, Airport Codes, ISO 4217 Currency Codes, SO 639-1 and 639-2 Language Codes List, ISO 6346 Container Codes, ISO 3166-1-Alpha-2 English Country Names and Code Elements, ISO 3166 Country Codes ITU Dialling Codes ISO 4217 Currency Codes, IETF Language Tags, UN-CEFACT Package Codes, Country and Continent Codes List. There are also statistics on Population Figures, Population Annual Time Series, Major Cities of the World, Spatial Relations Between Countries and Geographical Standards, Countries Geographical Territory Containment, Classification of the Functions of Government and Climate Negotiations Data.


Lastly, John Snow Labs has added 3 new datasets to the Internet Category where information on the List of Internet Media Types and Subtypes, Membership to International Copyright Treaties, Internet Top Level Domain Names and IPv4 Geolocations can be explored.

Health and Life Repositories

In addition to all these, and as part of John Snow Labs growing catalog, 42 new healthcare datasets have also been added to the Health and Life Repositories respectively. 35 Datasets were added to the Census, Hospitals, Measures, Outcomes, Pophealth and Providers categories in Health. Meanwhile, 7 datasets were added to the Life repository in the following categories: Clinical Trials, Devices and Genomics

This is just the beginning of a healthy partnership for a cause-for-health from John Snow Labs and Open Knowledge International, so expect more datasets to come, to be cleaned and to be normalized for you as the partnership grows.

John Snow Labs Datasets Monthly Release is now available in three simple and business friendly licensing

With more than 2,000 datasets across 15 areas of human health and well-being, now you can get all the benefits of John Snow Labs datasets in just three easy subscription plans:

  1. Full Download: Load the full datasets into your database or analytics platform for maximum performance & privacy
  2. Keep It Forever: Ending a subscription? Keep the data and rights you already have
  3. Royalty Free: Deploy the data as part of your commercial product

These plans will give you turnkey data for analysis already tested, optimized and customized in a ready to use format for your big data, data science or visualization platform.

Visit our new Data Libraries here to learn more.

John Snow Labs Doubles Life Sciences Coverage by Releasing 271 Curated Datasets

John Snow Labs Doubles Life Sciences Coverage by Releasing 271 Curated Datasets on Devices, Drug Pricing, Drug Safety, Drugs, Food and Genomics...