How much do you spend on going from science to product?
Data scientists are experts at experimenting and optimizing machine learning models, but not at delivering production-grade software. The software engineering work required to deploy, scale, secure, monitor and upgrade models in production is often its own lengthy project once a data science project is ‘done’.
John Snow Labs has built the software that takes a prototype that runs on a laptop and makes it run reliably in a production environment at scale.
Natural Language Understanding
We have developed, open sourced and provide enterprise support for the Natural Language Understanding library for Apache Spark. It is several times faster than leading Python and Scala libraries, and enables frictionless reuse of all Spark ML libraries. When the code is ready, deploy it as-is to a production Spark cluster or model server.
“John Snow Labs’ analysis has given me new ideas for exploring future data we may collect, as well as helped to inform the kinds of data we need to complete further analysis in the future.” Jennifer Cohen, Lead Project Manager at SHM Foundation
You are what you eat. What data do you feed your models?
Data for training machine learning models is plentiful these days. High quality data? Not so much. A whole lot of work by human domain experts is still required often to get data that is highly accurate, up to date, reliable, enriched, and fits the specific problem you’re solving. Keep that data up to date, relevant and representative as time goes by, is another operational headache.
John Snow Labs has built a data curation process combing human domain experts and a software platform. Data is optimized for and tested on the latest data science platforms, and full metadata is provided based on the latest data governance standards.
The Open Knowledge Foundation provides a data hub with hundreds of free data sets for the benefit of the open data community. John Snow Labs has taken on the curation and upkeep on over 120 core data sets in the areas of Climate, Economy, Geography, Health, Pharma and the Internet. All data and metadata is provided and validated against the Frictionless Data Package specification.
“John Snow Labs provided professional and friendly service to us as well as thorough analysis of complex data. Where relevant, I think they could contribute positively to future projects. I would recommend them for their availability, personal service and commitment and enthusiasm for the projects they assist.” Anna Kydd, Founder & Director at SHM Foundation
Machine learning has a much higher quality bar than BI
Data Quality is a basic requirement prior to any analysis. However, machine learning requires better than just checking that all null values, identifiers, dates, units and currencies are the same. Do you have the tools, people and process to check for bias in your data? To account for under-representation of part of the population you’re modeling? To evaluate how fast your training data must change over time?
John Snow Labs applies three quality reviews as every data set: Two levels of manual reviews by domain experts, and an automated suite of over 50 data & metadata quality tests.
Data Philanthropy & the 1% Pledge
John Snow Labs has pledged to donate 1% of product every year for the benefit of mankind. We support Data for Good projects, hackathons and philanthropic projects with free data, software and services. If you’re running a hackathon or accelerator, reach out to us about giving your audience free access to our full catalog, and a year-long commercial license & support for the winning project.
“I believe our collaboration on the DataforGood Hackathon was productive and seamless. We really appreciated your professionalism, promptitude and dedication when it comes both to the data quality and project management.” Chi Pakarinen, Project Managerat at The Synergist
Just moving the data into one place leaves your data scientists doing most of this work
Data flows into an analytics platform from many sources, in many formats and schemas. If it’s streaming or regularly updated, there is an operational burden to monitor servers and data quality. Beyond that, semantic inter-operability is an issue for data science projects: How can you trust that data from different sources, even if it has the name field name and type, really means the same thing?
John Snow Labs’ Data Lab includes an enterprise-grade data integration & data quality platform. It supports both batch and streaming data, visual data flows, a variety of built-in transformation & quality tools, and fine-grained data provenance.
Securing a big data or data science platform, especially when personal health or financial data is involved, is hard. During many man-years of operating such platforms, we’ve built our own threat intelligence data feeds. They integrate over 80 data sources that are constantly updated, filtered, de-duplicated, ranked, and published to protect our customers.
“The JSL combination of industry expertise and cybersecurity intelligence helped Atigeo identify 4.5 times more cyber threats for one customer.” Lead Project Manager at Atigeo
How fast can you go to market in a high compliance industry and still sleep at night?
Data security and privacy
We all want to guarantee that every step we take is legal, ethical and benefits mankind. In today’s data science landscape, this often requires data scientists to make privacy-utility tradeoffs and balance architecture, algorithms and process controls on their own. Even worse, teams must often implement cryptographically correct de-identification algorithms, or assert compliance against cyber-security and data privacy attacks, without having the right expertise.
John Snow Labs provides managed security services as well as data privacy & compliance project implementations. Our expertise is focused on helping data science projects in the healthcare and life science industries.
Schedule a call
To avoid issues like this and make the most of machine learning, it’s important to follow all five levels of the maturity model of productive analytics platform:
- Data Engineering
- Data Curation
- Data Quality
- Data Integration
- Data Security & Privacy
Tell us what you are doing and what complications you are facing and John Snow Labs will include our solutions within your operations.