Skip to main content
was successfully added to your cart.

Add AI & NLP to your Clinical Trial Master File (eTMF) system

John Snow Lab’s automation allows extraction of specific information from unstructured data, enabling semantic search, document classification, and metadata extraction. Get drugs to market faster with dramatically reduced manual labor.
80%

Time and labor savings

Webinar

Automating Clinical Trial Master File Migration & Information Extraction

End to end AI-enabled solution

  • Based on Apache Spark big data framework, scales with size of the data
  • Holds 9 records for globally the most accurate method
  • Allows human in the loop and exception handling
  • Secure and compliant – runs on your IT infrastructure

Semantic search

AI/NLP based:
  • Automatic handling of synonyms
  • Handling of misclassified documents
Handling human errors:
  • Documents accidentally saved on wrong place
  • Misclassified documents, typos
  • Mismatch between extracted metadata & content of the document
  • Duplicates

Classification and metadata extraction

Extracted information:
  • Artifact: Protocol Signature Page, Principal Investigator’s CV​
  • Version number​
  • Principal investigator’s last name​
  • Signature date​
Challenges:
  • Multiple dates present in text​
  • Hand-written dates and names​
  • OCR-related issues
Example of extracted information:
  • Artifact: Informed Consent Form, Site Staff Qualification Supporting Information​
  • First name and last name​
  • Relevant date​
  • Role​
  • and more.
Challenges:
  • Date selection (e.g. expiration dates may be extracted depending on the presence of other dates)​
  • Role extracted from content or mapped from metadata​
  • Depending on the case, ICF Type is extracted from the content or from the metadata
State of the art accuracy
  • Based on award winning Spark NLP software​

  • Combination of NLP and user defined rules

Faster & smarter​​
  • 80% reduction of manual labor​​

  • 80% reduction of migration time line

Secure and compliant​
  • On premise, air-gapped installation​​

  • Proven technology​

  • GxP Validated​

Automatic accuracy and confidence estimation
  • Automatic detection if the extracted information is correct​

  • Reduction of false positives is critical for business success​

  • Machine learning method

Proven in the real world NOVARTIS

Year-long migration project from legacy document system to new enterprise document management system

  • 48 Artifacts (document classes) of DIA TMF Reference Model, e.g., Site Staff Qualification Supporting Information, Sub-Investigator Curriculum Vitae, FDA 1572
  • 29 Attributes, e.g., First name, Last Name, Signature Date, Expiration Date, License Date