Abstract

Breast cancer (BC) is considered the most common cause of cancer deaths in women. This study aims to identify BC early based on machine learning algorithms and features selection methods. The overall methodology of this work was modified based on knowledge data discovery (KDD) process, which include four datasets, preprocessing phase (data cleaning, data splitting to training and testing sets), processing phase (feature selection, k-folds validation, and classification) and finally model evaluation. This paper presents a comparison between different classifiers such as decision tree (DT), random forest (RF), logistic regression (LR), Naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM). Four different breast cancer datasets (Wisconsin prognosis breast cancer (WPBC), Wisconsin diagnosis breast cancer (WDBC), Wisconsin Breast Cancer (WBC), and Mammographic Mass Dataset (MM-Dataset) based on BI-RADS findings) are conducted in the experiments. The proposed models were evaluated by utilizing classification accuracy and confusion matrix. The experimental results indicate that the classification based on RF technique with the Genetic Algorithm (GA) as a feature selection method is better than the other classifiers with an accuracy value 96.82% using WBC dataset. In WDBC dataset, the results indicate that the classification utilizing C-SVM technique with the applied kernel function RBF (Radial Basis Function) is superior to the other classifiers with an accuracy value 99.04%. In WPBC dataset, the results indicate that the classification using RF technique with recursive feature elimination (RFE) as a feature selection method is better than the other classifiers with an accuracy value 74.13%. In MM-Dataset, the results indicate that the classification using DT technique is better than the other classifiers with an accuracy value 83.74%. The findings indicate that the proposed models are effective by comparing with others existing models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call