Abstract

Deep learning techniques are capable of capturing complex input-output relationships, and have been widely applied to the diagnosis prediction task based on web-based patient electronic health records (EHR) data. To improve the prediction and interpretability of pure data-driven deep learning with only a limited amount of labeled data, a pervasive trend is to assist the model training with knowledge priors from online medical knowledge graphs. However, they marginally investigated the label imbalance and the task-irrelevant noise in the external knowledge graph. The imbalanced label distribution would bias the learning and knowledge extraction towards the majority categories. The task-irrelevant noise introduces extra uncertainty to the model performance. To this end, aiming at by-passing the bias-variance trade-off dilemma, we introduce a new sequential learning framework, dubbed SeqCare, for diagnosis prediction with online medical knowledge graphs. Concretely, in the first step, SeqCare learns a bias-reduced space through a self-supervised graph contrastive learning task. Secondly, SeqCare reduces the learning uncertainty by refining the supervision signal and the graph structure of the knowledge graph simultaneously. Lastly, SeqCare trains the model in the bias-variance reduced space with a self-distillation to further filter out irrelevant information in the data. Experimental evaluations on two real-world datasets show that SeqCare outperforms state-of-the-art approaches. Case studies exemplify the interpretability of SeqCare. Moreover, the medical findings discovered by SeqCare are consistent with experts and medical literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call