Abstract

Noncommunicable diseases (NCDs) have become the leading cause of death worldwide. NCDs' chronicity, hiddenness, and irreversibility make patients' disease self-awareness extremely important in disease control but hard to achieve. With an accumulation of electronic health record (EHR) data, it has become possible to predict NCDs early through machine learning approaches. However, EHR data from latent NCD patients are often irregularly sampled temporally, and the data sequences are short and imbalanced, which prevents researchers from fully and effectively using such data. Here, we outline the characteristics of typical short sequential data for NCD early prediction and emphasize the importance of using such data in machine learning schemes. We then propose a novel NCD early prediction method: the short sequential medical data-based early prediction method (SSEPM). The SSEPM network contains two stacked subnetworks for multilabel enhancement. In each subnetwork, long short-term memory (LSTM) and attention layers are implemented to extract both temporal and nontemporal embedded features. During training, with prior clinical knowledge of the NCD characteristics, a random connection (RC) process is proposed for data augmentation. Comparative experiments involving ten-fold cross-validation are performed with real-world medical data to predict 5 NCDs. The result shows that the SSEPM outperforms the state-of-the-art NCD early prediction algorithms and works well in dealing with short sequential data. The results also suggest that the direct use of short sequential data could be more effective than formatting datasets with temporal exclusion limitations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call