Diabetes Prediction Using Machine Learning Ensemble Model

Ong Yee Hang Ong Yee Hang,Rosly Rosaida Rosly Rosaida,Wiwied Virgiyanti Wiwied Virgiyanti

doi:10.37934/araset.37.1.8298

Ong Yee Hang Ong Yee Hang, Rosly Rosaida Rosly Rosaida + Show 1 more

Open Access

https://doi.org/10.37934/araset.37.1.8298

Copy DOI

Abstract

Malaysia National Health and Morbidity Survey revealed that one-fifth of Malaysian adults are diagnosed with Diabetes. It exists in different age groups and is hardly discovered especially among youths as the test could only be performed in certain places which require special equipment. It is essential to develop a tool that is capable to generate high accuracy predictions. This research underwent features selection of a secondary dataset which contains seventeen attributes, with no irrelevant data and missing values, and fed it into an AdaBoost with Decision Tree as Base Algorithm Model, Support Vector Machine (SVM), and an ensemble model developed by the machine learning knowledge. The first five most influenced features in the dataset were selected using SelectKBest for each model to conduct training and testing on the dataset and higher accuracy prediction results were achieved. The predictions from the three models were compared and the results from AdaBoost and SVM were combined in the ensemble model. A diabetes prediction prototype was developed to compare the accuracy of the three methods using the observed dataset. This research concludes the ensemble model gives the highest accuracy for Diabetes prediction and might be considered the most suitable method applied in Diabetes prediction tools.

Full Text