Abstract

Diabetes is a chronic disease and can cause long-term complications if not handled properly. To prevent this, a machine learning model is needed to predict diabetes with high accuracy. This study aims to see the effect of reducing feature dimensions on model performance and to see the effect of data cleaning on model performance. This study used the Pima Indian Dataset, two models were created with different preprocessing stages. The first model was created without performing data cleansing, and the second model was created by performing data cleansing. After the next preprocessing stage, the number of features that produce the best performance is sought using Sequential Forward Selection and the model is drilled using the Support Vector Machine algorithm. After going through the training stage, the two models will be tested and their performance will be compared. The results showed that reducing the number of features made the model have better performance. And of the two types of models, the model that uses the data cleaning stage shows better performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call