The datasets published on John Snow Labs Data Library are premium quality datasets already tested, optimized and customized in a ready to use format.
Extensive efforts have been invested in preparing and optimizing those datasets for immediate use:
– They have been curated by human experts,
– Out of the box optimized data formats for R, Python, SAS, Hadoop, Spark, SQL & BI tools;
– Daily updates are integrated and published so the user can get automatic, versioned, clean & tested updates as they happen;
– All data is under one license with royalty-free, commercial redistribution rights;
– Datasets are triple checked – automatically and manually, to make sure that they are error-free and ready for production use;
– Our datasets are clean and interoperable. For this, we are using a unified and standards-based data model – including numbers, dates, units, currency, null values, identifiers & references.