Abstract

This paper proposes a hybrid feature selection sequence complemented with filter and wrapper concepts to improve the accuracy of Machine Learning (ML) based supervised classifiers for classifying the survivability of breast cancer patients into classes, living and deceased using METABRIC and Surveillance, Epidemiology and End Results (SEER) datasets. The ML-based classifiers used in the analysis are: Multiple Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine and Multilayer Perceptron. The workflow of the proposed ML algorithm sequence comprises the following stages: data cleaning, data balancing, feature selection via a filter and wrapper sequence, cross validation-based training, testing and performance evaluation. The results obtained are compared in terms of the following classification metrics: Accuracy, Precision, F1 score, True Positive Rate, True Negative Rate, False Positive Rate, False Negative Rate, Area under the Receiver Operating Characteristics curve, Area under the Precision-Recall curve and Mathews Correlation Coefficient. The comparison shows that the proposed feature selection sequence produces better results from all supervised classifiers than all other feature selection sequences considered in the analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call