This study presents a comprehensive analysis of various machine learning models to predict diabetes. The research evaluates and compares the predictive performance of advanced ensemble techniques Extra Trees Classifier and LightGBM with traditional machine learning algorithms and simpler deep neural network (DNN) architectures. The dataset comprises numerous features pertinent to diabetes diagnosis, such as glucose concentration, BMI, and insulin levels, among others. A methodology, including polynomial feature transformation and ten-fold cross-validation, was employed to ensure the study's reliability and the models' capability to generalize. The advanced ensemble models, Extra Trees and LightGBM, achieved stellar predictive metrics, with the former attaining a near-perfect ROC AUC, accuracy, precision, and an F1 score close to 1. LightGBM followed closely, demonstrating the high efficacy of ensemble methods in complex data settings. These results were contrasted with significantly lower performance metrics from DNNs and respectable, albeit lower, scores from traditional models like Decision Trees, Random Forest, KNN, and XGBoost.
Read full abstract