Abstract

BackgroundEarly diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy.MethodsWe obtained the demographic characteristics and laboratory data from the EHRs for patients admitted to Changzhou No. 2 People’s Hospital, the affiliated hospital of Nanjing Medical University in China from May 2013 to June 2020. The data included 93 biochemical indicators and 9,765 patients. We used the Pearson correlation coefficient (PCC) to analyze the correlations between different diabetic complications from a statistical perspective. We used an MLC model, based on the Random Forest (RF) technique, to leverage these correlations and predict four complications simultaneously. We explored four different MLC models; a Label Power Set (LP), Classifier Chains (CC), Ensemble Classifier Chains (ECC), and Calibrated Label Ranking (CLR). We used traditional Binary Relevance (BR) as a comparison. We used 11 different performance metrics and the area under the receiver operating characteristic curve (AUROC) to evaluate these models. We analyzed the weights of the learned model and illustrated (1) the top 10 key indicators of different complications and (2) the correlations between different diabetic complications.ResultsThe MLC models including CC, ECC and CLR outperformed the traditional BR method in most performance metrics; the ECC models performed the best in Hamming loss (0.1760), Accuracy (0.7020), F1_Score (0.7855), Precision (0.8649), F1_micro (0.8078), F1_macro (0.7773), Recall_micro (0.8631), Recall_macro (0.8009), and AUROC (0.8231). The two diabetic complication correlation matrices drawn from the PCC analysis and the MLC models were consistent with each other and indicated that the complications correlated to different extents. The top 10 key indicators given by the model are valuable in medical application.ConclusionsOur MLC model can effectively utilize the potential correlation between different diabetic complications to further improve the prediction accuracy. This model should be explored further in other complex diseases with multiple complications.

Highlights

  • Complications of diabetes are the leading cause of death in diabetic patients [1], with 76.4% of diabetic patients reporting at least one complication [2]

  • Modern electronic health records (EHRs) [5, 6] is a rich resource for clinical data from which newer physical indicators can be identified as predictors of diabetic complications to assist in treatment planning

  • To frame the diabetic complication classification into an multi-label classification (MLC) problem, multi-label methods were first applied to predict diabetes complications from EHRs by Bai et al The results indicated that random k-label sets and chained classifiers performed better than binary relevance, least combination, and pruned sets [33].We aimed to identify the best MLC model to predict diabetic complications and inform clinical decisions that could help personalize type 2 diabetes management

Read more

Summary

Introduction

Complications of diabetes are the leading cause of death in diabetic patients [1], with 76.4% of diabetic patients reporting at least one complication [2]. Modern electronic health records (EHRs) [5, 6] is a rich resource for clinical data from which newer physical indicators can be identified as predictors of diabetic complications to assist in treatment planning. Each diabetic complication was modeled and predicted independently in these studies, making it impossible to leverage the potential correlations among diabetes complications. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.