TransLSTD: Augmenting hierarchical disease risk prediction model with time and context awareness via disease clustering

Tao You,Peng Zhang,Wei Huang,Qiaodong Dang,Guanzhong Wu,Qing Li

doi:10.1016/j.is.2024.102390

Abstract

The use of electronic health records has become widespread, providing a valuable source of information for predicting disease risk. While deep neural network models have been proposed and shown to be effective in this task, supplemented with medical domain knowledge for interpretability, several limitations still exist. Firstly, there is often a lack of differentiation between chronic and acute diseases leading to biased modeling of diseases. Secondly, the extraction of patient single-layer temporal patterns is limited, which hinders comprehensive representation and predictive power. Thirdly, weak interpretability based on deep neural networks prevents the extraction of valuable medical knowledge, limiting practical applications. To overcome these challenges, we propose TransLSTD, a hierarchical model that incorporates time awareness and context awareness while distinguishing between long-term and short-term diseases. TransLSTD uses clustering algorithms to classify disease types based on the occurrence feature matrix of diseases from EHR dataset and updates disease representation at the code level while creating patient visit embeddings. The model utilizes query vectors to incorporate visit context information and combines time data to capture the patient’s overall health status. Finally, the prediction module generates outcomes and provides effective interpretations. We demonstrate the effectiveness of TransLSTD using two real-world datasets, outperforming state-of-the-art models in terms of both AUC and F1 values. The data and code are released at https://github.com/DangQD/TransLSTD-master.

Full Text