Welcome to the
Try searching for:
How It Works
Welcome to the Land of Clean Data!
- 2 Manual reviews are done by domain experts
- Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints
- All dates, units, codes, currencies look the same
- All null values are normalized to the same value
- All dataset and field names are SQL and Hive compliant
- Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
- Metadata is provided in the open Frictionless Data standard, and every field is normalized & validated
- Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted
Welcome to Expert Curated Data!
Field names, descriptions, and normalized values are chosen by people who actually understand their meaning
Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset
Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations
The data is always kept up to date – even when the source requires manual effort to get updates
Support for data subscribers is provided directly by the domain experts who curated the data sets
Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution
Welcome to Easy to Use Data!
- Read CSV or Parquet data with one-liners from the standard libraries of Python, R, SAS, SPSS, or Spark;
- Full download of data enables you to get the most out of your memory, database, or cluster;
- Subscribe to dataset updates to automate them.
- 26 out of the box integrations to the world’s most popular analytics tools, via our data.world partnership;
- SQL and SPARQL queries via a web UI or REST API.
- Need to load 1,000 datasets into a SQL or Hive DB? Create and populate all tables with one script, thanks to the complete & standardized schemas in metadata.
- Don’t know the jargon? Our experts curate extra search terms so that you can find ”NPPES” also by ”all US doctors” or “national providers database”.
- Not sure what the data is about? Metadata is provided in human-readable PDF in addition to JSON.
26 out of the box data integrations
Frequently Asked Questions
Through the Data Market, John Snow Labs offers a wide range of health and life science datasets and data packages.
John Snow Labs offers access to datasets that have been curated by a team of specialists in the health and life science domains. Thanks to the vast team expertise and experience in data acquisition, data curation, data normalization and data publishing, our datasets are cleaner, better documented, better structured and enriched with useful information than their free equivalents offered by various well established and trustworthy data publishers.
Our datasets are extremely easy to understand, use and integrate into your existing systems and tools. You can find a list of our databases on our vendors page.
Every single dataset on John Snow Labs has a fully transparent link back to its source. This means you can always verify the data as published by its original source. Transparency is the ultimate enabler of trust.
The main customers targeted by John Snow Labs Data Library are:
- Healthcare and Life Science application providers;
- Data integrators that want to provide data-centered services and are interested in John Snow Labs datasets;
- SMEs that want to develop new products based on health and life science data;
- CIOs/CEOs/CTOs healthcare related businesses;
- Data scientists;
- Data publishers that want to integrate their datasets with complementary health and life datasets for a richer context and relevance;
The John Snow Labs Data Library is an online data repository that allows users to access, download, and use datasets or data packages (groups of related datasets) curated by John Snow Labs team of experts. It is a quick and easy to access gateway to the John Snow Labs data catalog, a unique resource of normalized, clean and enriched collection of health and life science datasets.
The data library contains virtual products in the form of datasets and data packages that can be downloaded and used:
- for research purposes for free and
- for commercial purposes after paying a subscription fee.
As long as the subscription is valid the user will have a commercial license to use to the datasets and will get all available updates.
The Data Library provides a dedicated web page where the users can search for the datasets she/he is interested in and explore the available data catalog.
The search functionality works on both dataset name and dataset description. By default, all available datasets are displayed as a list of products.
The following information is available for each product, on the main shop page:
- name of the dataset;
- relevant short description;
- image that identifies the name of the data package that includes the current dataset;
- data download button for logged in users.
The Data Library provides dedicated pages for all available datasets and data packages.
The dataset details page includes the following information:
- the dataset name;
- license information for the logged in user;
- direct download links for CSV data, PDF reference file, and JSON metadata file;
- the image associated with the dataset;
- a short description of the dataset;
- a detailed description of the dataset;
- a clear description of the list of fields together with typing information;
- data preview;
- a data package section that shortly describes the data package that includes the current dataset;
- a related dataset section containing all datasets that are in the same accelerator as the current dataset
A data package is a group of datasets that are related. In other words, datasets included in the same package describe the same data from different points of view or describe complementary data or data that is somehow related.
The datasets published on John Snow Labs Data Library are premium quality datasets already tested, optimized and customized in a ready to use format.
Extensive efforts have been invested in preparing and optimizing those datasets for immediate use:
- They have been curated by human experts,
- Out of the box optimized data formats for R, Python, SAS, Hadoop, Spark, SQL & BI tools;
- Daily updates are integrated and published so the user can get automatic, versioned, clean & tested updates as they happen;
- All data is under one license with royalty-free, commercial redistribution rights;
- Datasets are triple checked – automatically and manually, to make sure that they are error-free and ready for production use;
- Our datasets are clean and interoperable. For this, we are using a unified and standards-based data model – including numbers, dates, units, currency, null values, identifiers & references.
By using our datasets you will save more than 4,000 hours in data preparation (cleaning, transformation, normalization, etc.) each month.
We offer you turnkey data for analysis already tested, optimized and customized in a ready to use format for your big data, data science or visualization platform.
A user can cancel any order which has on-hold status. On-hold status means that the payment has not been processed yet. Once the payment is computed, the user receives a commercial license agreement for the entire data catalog, the order can no longer be canceled.
An order cancellation does not imply any payment/penalty.
Any active subscription on John Snow Labs Data Library can be cancelled at any time. The cancelation of a subscription stops future renewal charges but does not result in a refund of your order.
Commercial use of the dataset(s) is still allowed until the day the current subscription expires.
The use of John Snow Labs datasets is free forever for academics, researchers, and students.
The Data Library allows users to easily buy a subscription to the entire catalog. The subscription functionality is accessible from the Data Library main page.
By clicking on the Subscribe to Data Library buttons on the Data Library main page, or on the dataset details pages, the subscription is added to your cart. Once the order is passed and the payment is confirmed the user will gain commercial rights to all datasets.
The subscription is valid for one year and entitles the user to instantly access all available data updates and an unlimited number of downloads.
The payment methods currently supported by John Snow Labs Data Library are:
- Credit card directly on our website;
- Bank transfer to the account received via e-mail once the order is confirmed.
Subscriptions to John Snow Labs Data Library are not returnable or refundable after purchase.
Orders with status on hold can be canceled for free.
A user can cancel any order which has on-hold status. On hold status means that the payment has not been processed yet. Once the payment is computed the order can no longer be canceled. An order cancellation does not imply any payment/penalty.
Active subscriptions can be canceled but we do not provide any reimbursement for the already paid subscriptions.
Any active subscription on John Snow Labs Data Library can be canceled at any time. The cancelation of a subscription stops future renewal charges but does not result in a refund of your order.
Commercial exploitation rights to the datasets will be valid until the day the current subscription expires.
Your list of subscriptions can be accessed in your account section of the Data Library.