A typical sample-driven learning framework for automatic disease diagnosis

Chenwei Yan,Xinxin You,Xiangling Fu,Xien Liu,Ji Wu

doi:10.1016/j.asoc.2024.111745

Abstract

Disease diagnosis mainly depends on the doctor’s medical knowledge and clinical experience, which can be treated as a medical text classification task. We observe that existing data-driven methods always suffer from the distribution bias since a small amount of common diseases appear high-frequently, while most diseases are infrequent in real-world, which leads to an unbalanced data distribution in the disease diagnosis task. To address this problem, we propose a new learning framework, Typical sample-Driven Graph Neural Network (TD-GNN) for disease knowledge representation and classification. In our framework, different from previous methods, each disease (label) is concretized and learned from several corresponding well-representative samples rather than full imbalance data. In addition, the contrastive learning strategy is utilized to enhance the distinguishable features learning between different diseases. In this study, we construct a real-world dataset covering 350 common diseases to evaluate the proposed learning method. The experimental results demonstrate that the proposed TD-GNN significantly outperforms the state-of-the-art baselines, especially for the majority of diseases in which only small samples can be collected from the real world. Additionally, our method can provide a sample-based interpretation for disease prediction learning.

Full Text