Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

Minjun Chen,Nicholas Mann,Wenjun Bao,Joshua Xu,Thomas J Pedersen,Yue Wu,Tom Donnelly,Zhichao Liu,Weida Tong,Russell D Wolfinger,Byron Wingerd,Shraddha Thakkar

doi:10.3389/frai.2024.1401810

Abstract

Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources. We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA's DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token. The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA). Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

Abstract

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence

Lead the way for us

Journal: Frontiers in artificial intelligence	Publication Date: Jun 3, 2024
License type: CC BY 4.0

Similar Papers

Table1.DOCX
-
-
--
06 Dec 2021
06 Dec 2021

“Black box” 101: How the Food and Drug Administration evaluates, communicates, and manages drug benefit/risk
Shirley Murphy ... Rosemary Roberts
Journal of Allergy and Clinical Immunology | VOL. 117
Shirley Murphy, et. al.Shirley Murphy ... Rosemary Roberts
29 Dec 2005
Journal of Allergy and Clinical Immunology | VOL. 117

BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.
Yue Wu ... Weida Tong
Frontiers in Artificial Intelligence | VOL. 4
Yue Wu, et. al.Yue Wu ... Weida Tong
06 Dec 2021
Frontiers in Artificial Intelligence | VOL. 4

A Unifying Ontology to Integrate Histological and Clinical Observations for Drug-Induced Liver Injury
Yuping Wang ... Weida Tong
The American Journal of Pathology | VOL. 182
Yuping Wang, et. al.Yuping Wang ... Weida Tong
08 Feb 2013
The American Journal of Pathology | VOL. 182

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

Abstract

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence