Abstract

Diabetes mellitus is one of the major public health problems in the world due to its high prevalence and medical costs. The prevention effort necessitates reliable risk assessment models which can effectively identify high-risk individuals and enable healthcare practitioners to initiate appropriate preventive interventions. However, diabetes risk assessment models based on data analysis face multiple challenges, such as class imbalance and low identification rate. To cope with these challenges, this paper proposed an analytical framework based on data-driven approaches using large population data from the Henan Rural Cohort Study. A joint bagging-boosting model (JBM) was developed and validated. For the convenience of large-scale population screening, our study excluded laboratory variables and collinearity variables using the maximum likelihood ratio method to obtain accessibility variables. Then, we explored the effects of different methods for dealing with the unbalanced nature of the available data, including over-sampling and under-sampling methods. Finally, to improve the overall model performance, a joint model which combined the bagging and boosting algorithms with the stacking algorithm was constructed. The model we built demonstrated good discrimination, with an area under the curve (AUC) value of 0.885, and acceptable calibration (Brier score = 0.072). Compared with the benchmark model, the proposed framework improved the AUC value of the overall model performance by 13.5%, and the recall increased from 0.744 to 0.847. The proposed model contributes to the personalized management of diabetes, especially in medical resource-poor settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.