A precision‐preferred comprehensive information extraction system for clinical articles in traditional Chinese Medicine

Ye Xia,Jiayi Sun,Lin Wu,Yixing Liu,Yizhen Li,Darong Wu,Zhili Dou,Jianxiong Cai,Zhe Huang,Dongran Han,Shujing Xu,Yunan Zhang

doi:10.1002/int.22748

Abstract

This study established a precision-preferred system specially designed for the data extraction of traditional Chinese medicine (TCM) articles, providing foundational data for subsequent clinical article analysis and synthesis of TCM clinical evidence. Information extraction is commonly used in many fields to identify relevant concepts and the relationship between pairs of concepts from the vast information sources. Previous studies that performed information extraction primarily focused on scattering targeted fields to achieve a balance between precision and recall. Therefore, this study aims to create a comprehensive information extraction system for TCM articles. This system will extract all relevant information from research articles on a broad research field, including the 11 diseases that can be efficiently treated with TCM, with high precision and efficient measurement to address bias in every study. It covers the most essential information related to patients, interventions, comparisons, outcomes, and study design (PICOS) principles in TCM clinical trials. This system covers 34 target fields on 14 topics. Impediments such as the various typesetting of TCM clinical articles were managed by a hybrid of machine vision and optical character recognition. Thus, TCM researchers can be spared of laborious, unscalable, and inefficient manual extraction processes. Our system could also enhance TCM researcher awareness of frequently missing information or TCM clinical trial design methods that could introduce bias, by analyzing the overall information integrity of TCM clinical articles, which is beneficial for future research designs.

Full Text