SUSIE: Pharmaceutical CMC ontology-based information extraction for drug development using machine learning

Vipul Mann,Shekhar Viswanath,Shankar Vaidyaraman,Jeya Balakrishnan,Venkat Venkatasubramanian

doi:10.1016/j.compchemeng.2023.108446

Abstract

Automatically extracting information from unstructured text in pharmaceutical documents is important for drug discovery and development. This information can be integrated with structured datasets to ultimately accelerate pharmaceutical product development. To this end, we report an end-to-end information extraction framework based on a custom-built pharmaceutical drug development ontology, a weak supervision framework, contextualization algorithms, and a fine-tuned BioBERT model (adaptation of BERT or Bidirectional Encoder Representations from Transformers for biomedical text). The proposed framework, SUSIE (Schema-based Unsupervised Semantic Information Extraction), was trained on ICH (International Conference on Harmonization) documents to identify important entities and relations from unstructured text and auto-generate knowledge graphs representing crucial information in a structured format. On the entity identification task, the framework achieves a test accuracy and F1-score of 96% and 88%, respectively, on out-of-sample documents. A major contribution of this work is to build an automated, unsupervised information extraction framework around a domain-specific, custom-built pharmaceutical drug development ontology without the need for manual curation of training datasets for specific tasks. The efficacy of the approach was tested on out-of-sample documents including an internal Eli Lilly technical document.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SUSIE: Pharmaceutical CMC ontology-based information extraction for drug development using machine learning

Abstract

Talk to us

Similar Papers

More From: Computers & Chemical Engineering

Lead the way for us

Journal: Computers & Chemical Engineering	Publication Date: Oct 7, 2023
Citations: 2

Similar Papers

Joint unsupervised structure discovery and information extraction
Eli Cortez ... Altigran S Da Silva
-
Eli Cortez, et. al.Eli Cortez ... Altigran S Da Silva
12 Jun 2011
12 Jun 2011

Towards a Method for Unsupervised Web Information Extraction
Hassan A Sleiman ... Rafael Corchuelo
-
Hassan A Sleiman, et. al.Hassan A Sleiman ... Rafael Corchuelo
01 Jan 2012
01 Jan 2012

Information extraction framework for Kurunthogai
C N Subalalitha
Sādhanā | VOL. 44
C N SubalalithaC N Subalalitha
05 Jun 2019
Sādhanā | VOL. 44

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
Matthew Michelson ... Craig A Knoblock
International Journal of Document Analysis and Recognition (IJDAR) | VOL. 10
Matthew Michelson, et. al.Matthew Michelson ... Craig A Knoblock
16 Oct 2007
International Journal of Document Analysis and Recognition (IJDAR) | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SUSIE: Pharmaceutical CMC ontology-based information extraction for drug development using machine learning

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Chemical Engineering

More From: Computers & Chemical Engineering