Abstract

Diabetes is a chronic disease and can cause long-term complications if not handled properly. To prevent this, a machine learning model is needed to predict diabetes with high accuracy. This study aims to see the effect of reducing feature dimensions on model performance and to see the effect of data cleaning on model performance. This study used the Pima Indian Dataset, two models were created with different preprocessing stages. The first model was created without performing data cleansing, and the second model was created by performing data cleansing. After the next preprocessing stage, the number of features that produce the best performance is sought using Sequential Forward Selection and the model is drilled using the Support Vector Machine algorithm. After going through the training stage, the two models will be tested and their performance will be compared. The results showed that reducing the number of features made the model have better performance. And of the two types of models, the model that uses the data cleaning stage shows better performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.