Abstract

Drug labeling contains an ‘INDICATIONS AND USAGE’ that provides vital information to support clinical decision making and regulatory management. Effective extraction of drug indication information from free-text based resources could facilitate drug repositioning projects and help collect real-world evidence in support of secondary use of approved medicines. To enable AI-powered language models for the extraction of drug indication information, we used manual reading and curation to develop a Drug Indication Classification and Encyclopedia (DICE) based on FDA approved human prescription drug labeling. A DICE scheme with 7,231 sentences categorized into five classes (indications, contradictions, side effects, usage instructions, and clinical observations) was developed. To further elucidate the utility of the DICE, we developed nine different AI-based classifiers for the prediction of indications based on the developed DICE to comprehensively assess their performance. We found that the transformer-based language models yielded an average MCC of 0.887, outperforming the word embedding-based Bidirectional long short-term memory (BiLSTM) models (0.862) with a 2.82% improvement on the test set. The best classifiers were also used to extract drug indication information in DrugBank and achieved a high enrichment rate (>0.930) for this task. We found that domain-specific training could provide more explainable models without performance sacrifices and better generalization for external validation datasets. Altogether, the proposed DICE could be a standard resource for the development and evaluation of task-specific AI-powered, natural language processing (NLP) models.

Highlights

  • Drug labeling contains an ‘INDICATIONS AND USAGE’ section that provides vital information to support clinical decision making and regulatory management

  • We developed a five-category Drug Indication Classification and Encyclopedia (DICE) based on Food and Drug Administration (FDA) approved human prescription drug labeling to facilitate the development of AIbased natural language processing (NLP) approaches for enhanced drug indication extraction from free text-based document resources

  • Bidirectional Long Short-Term Memory To better understand the framework and theory behind the Bidirectional long short-term memory (BiLSTM), we provide a simple introduction on the Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM)

Read more

Summary

Introduction

Drug labeling contains an ‘INDICATIONS AND USAGE’ section that provides vital information to support clinical decision making and regulatory management. The primary role of drug indications is to enable health care practitioners to readily identify appropriate therapies for patients and support clinical decision making (Sohn and Liu, 2014). Drug indications provide guidance for facilitating clinical knowledge management and play an essential role in enabling the secondary use of electronic medical records (EMRs) for clinical-based translational research. Indication information extraction is a regulatory requirement for creating the highlights section of the Physician Labeling Rule (PLR) labeling, which provides concise information for public health practitioners, patients and drug reviewers (https://www.fda.gov/ drugs/laws-acts-and-rules/prescription-drug-labeling-resources). Developing an effective approach to facilitate the mining of drug indication information from free text-based resources is an important task for biomedical natural language processing (NLP)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call