Abstract

The most typical cancer type among women worldwide is breast cancer. In 2020 alone, it afflicts about 0.68 million people and 6.9% of all cancer cases. How to categorize tumors as benign (non-cancerous) or malignant (cancerous) is one of the main obstacles to its diagnosis. This study helps to make an accurate and reliable diagnosis based on the initial data of the tumor, such as smoothness, texture, area using machine learning models. This study uses five machine learning models, Logistic Regression (RF), Random Forest (RF), Support Vector Machine (SVM), K-nearest Neighbor (KNN), Naive Bayes Classifier (NBC) and three modelling systems, feature selection-ML and principal component analysis (PCA)-ML system to make predictions of the type of the tumor of Wisconsin Breast Cancer Dataset. Model performance are assessed by three performance evaluation which are accuracy, precision, recall. The results of full model show that random forest has the highest prediction accuracy of 98.25% out of the sample and 100% in the sample, and SVM's sigmoid-based kernel model has the lowest prediction accuracy of 83.33% outside and 85.27% inside the sample. The results of the feature selection model based on RF and LR shows that the RF with only 13 variables has the highest prediction accuracy 98.25% out-of-sample and 100% in-sample. Among all the PCA--ML models, PCA--NBC has the highest prediction accuracy of 97.33% out-of-sample. Nevertheless, PCA-RF has the highest prediction accuracy of 100% in-sample.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call