Predictive Modeling and Analytics for Diabetes using Hyperparameter tuned Machine Learning Techniques

Subhash Chandra Gupta,Noopur Goel

doi:10.1016/j.procs.2023.01.104

Abstract

Accuracy of a classifier is important for the success of any prediction model. The more accuracy a classifier possesses, the more robust the system is made on it. In this paper, a disease prediction model is developed in Python for the classification of diabetes in patients. In the research paper, study is performed to make a comparative analysis of the performance of machine learning classification algorithms. The classifier's performances are enhanced by of tuning the hyperparameters of classifiers and applied different dataset preprocessing methods. In this experimental analysis, four models have been created, and each model is based on a dataset, obtained by different preprocessing methods of PIMA dataset. For each model, K-Nearest Neighbors, Decision Tree, Random Forest, and Support vector machines classification algorithms, have been applied and classifier's hyperparameters are tuned to get better results from these models.A detail analysis has also performed to get the best prediction model, the best classifier and effective preprocessing methods for it. The prediction model use F1score as the main metric. The highest F1score and accuracy are 75.68 % and 88.61% respectively, which is achieved by Random Forest classifier for dataset model D3 obtained by removing the samples having missing or unknown values from PIMA dataset.

Full Text