Diabetes mellitus (DM) is a combination of metabolic disorders characterized by elevated blood glucose levels over a prolonged duration. Undiagnosed DM can give rise to a host of associated complications like retinopathy, nephropathy and neuropathy and other vascular abnormalities. In this background, machine learning (ML) approaches can play an essential role in the early detection, diagnosis and therapeutic monitoring of the disease. Recently, several research works have been proposed to predict the onset of DM. To this end, we develop a stacking-based evolutionary ensemble learning system “NSGA-II-Stacking” for predicting the onset of Type-2 diabetes mellitus (T2DM) within five years. For this purpose, publicly accessible Pima Indian diabetes (PID) dataset is utilized. As a data pre-processing step, the missing values and outliers are identified and imputed with the median values. For base learner selection, a multi-objective optimization algorithm is utilized which simultaneously maximizes the classification accuracy and minimizes the ensemble complexity. As for model combination, k-nearest neighbor (K-NN) is employed as a meta-classifier that combines the predictions of the base learners. The comparative results demonstrate that the proposed NSGA-II-Stacking method significantly outperforms several individual ML approaches and conventional ensemble approaches. In terms of performance metrics, the proposed system achieves the highest accuracy of 83.8 %, sensitivity of 96.1 %, specificity of 79.9 %, f-measure of 88.5 % and area under ROC curve of 85.9 %.
Read full abstract