Eliciting symptom‐diagnosis knowledge from online medical Q&amp;A

Ying Yu,Hao Huang

doi:10.1111/exsy.12821

Abstract

AbstractWith the objective to automatically detect diseases from symptoms in free‐text data, a methodology to extract symptom‐diagnosis knowledge from online medical textual data in Q&A domain is proposed in this paper: (1) a term frequency‐inverse document frequency and PRECISION method is adopted to retrieve symptom words from unstructured text; (2) a variable precision rough set based genetic algorithm is applied to reduce redundant symptom words, and a rough set based rule is utilized for adding discriminative symptom words assisting to discriminate diseases sharing similar symptoms; (3) by employing fuzzy linguistic variables to express the risk level of disease or severity level of symptoms, a knowledge base with fuzzy belief structure is generated. Using data extracted from a Chinese medical Q&A forum for training and testing, some classical gastrointestinal diseases serve as a case study to evaluate the efficiency of the proposed methodology. Subsequently performance comparisons are made between the proposed methodology and some other classifiers, such as the decision tree algorithms including ID3 and J45, and the Bayesian network classifier. The comparative results demonstrate that the proposed methodology outperforms the decision tree algorithms and the Bayesian network classifier.

Full Text