An AI-based solution that delivers a future-proof model using transfer learning which can be used to convert source-agnostic unstructured data into structured data. It supports the classification of artifacts and sub-artifacts and extraction of metadata that are defined in TMF Reference Model.
The core pipeline comprises OCR based text extraction, language detection, layout & content-based document classifiers, more than 40 different DL based named entity recognition models, each of which is trained on a set of document types and extracting various target entities given the document type, handwritten text detection, handwritten date extraction, and artifact-based post-processing rules to automate the migration between different document management systems in an air-gapped network.