Abstract
ObjectivesCardiovascular disease (CVD) is one of the major causes of death worldwide. For improved accuracy of CVD prediction, risk classification was performed using national time-series health examination data. The data offers an opportunity to access deep learning (RNN-LSTM), which is widely known as an outstanding algorithm for analyzing time-series datasets. The objective of this study was to show the improved accuracy of deep learning by comparing the performance of a Cox hazard regression and RNN-LSTM based on survival analysis.Methods and findingsWe selected 361,239 subjects (age 40 to 79 years) with more than two health examination records from 2002–2006 using the National Health Insurance System-National Health Screening Cohort (NHIS-HEALS). The average number of health screenings (from 2002–2013) used in the analysis was 2.9 ± 1.0. Two CVD prediction models were developed from the NHIS-HEALS data: a Cox hazard regression model and a deep learning model. In an internal validation of the NHIS-HEALS dataset, the Cox regression model showed a highest time-dependent area under the curve (AUC) of 0.79 (95% CI 0.70 to 0.87) for in females and 0.75 (95% CI 0.70 to 0.80) in males at 2 years. The deep learning model showed a highest time-dependent AUC of 0.94 (95% CI 0.91 to 0.97) for in females and 0.96 (95% CI 0.95 to 0.97) in males at 2 years. Layer-wise Relevance Propagation (LRP) revealed that age was the variable that had the greatest effect on CVD, followed by systolic blood pressure (SBP) and diastolic blood pressure (DBP), in that order.ConclusionThe performance of the deep learning model for predicting CVD occurrences was better than that of the Cox regression model. In addition, it was confirmed that the known risk factors shown to be important by previous clinical studies were extracted from the study results using LRP.
Highlights
Cardiovascular disease (CVD) is one of the leading causes of mortality worldwide [1]
Two CVD prediction models were developed from the National Health Insurance System (NHIS)-HEALS data: a Cox hazard regression model and a deep learning model
In an internal validation of the NHIS-HEALS dataset, the Cox regression model showed a highest time-dependent area under the curve (AUC) of 0.79 for in females and 0.75 in males at 2 years
Summary
Cardiovascular disease (CVD) is one of the leading causes of mortality worldwide [1]. Various prediction models were developed to identify individuals that have a high risk of developing CVD, and Cox hazard regression analysis has been the traditional approach [2,3,4,5,6,7]. Cox hazard regression models have been used to identify risk factors in phases of risk ratios and provide a probability that an individual will develop CVD, enabling personalized treatment for high-risk individuals [8]. The selected risk factors are measured at pre-planned times, so information on the collected risk factors can be fully used by statistical methods. Due to the variety of types and cycles of risk factor measurements in clinical studies, existing statistical models do not have all the information on CVD risk, and only parts of those databases are available. Appropriate analysis methods for maximizing the predictive performance using these multi-measurement datasets have not been clearly defined
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.