Abstract

Nowadays, breast cancer is the most frequent cancer among women. Early detection is a critical issue that can be effectively achieved by machine learning (ML) techniques. Thus in this article, the methods to improve the accuracy of ML classification models for the prognosis of breast cancer are investigated. Wrapper-based feature selection approach along with nature-inspired algorithms such as Particle Swarm Optimization, Genetic Search, and Greedy Stepwise has been used to identify the important features. On these selected features popular machine learning classifiers Support Vector Machine, J48 (C4.5 Decision Tree Algorithm), Multilayer-Perceptron (a feed-forward ANN) were used in the system. The methodology of the proposed system is structured into five stages which include (1) Data Pre-processing; (2) Data imbalance handling; (3) Feature Selection; (4) Machine Learning Classifiers; (5) classifier’s performance evaluation. The dataset under this research experimentation is referred from the UCI Machine Learning Repository, named Breast Cancer Wisconsin (Diagnostic) Data Set. This article indicated that the J48 decision tree classifier is the appropriate machine learning-based classifier for optimum breast cancer prognosis. Support Vector Machine with Particle Swarm Optimization algorithm for feature selection achieves the accuracy of 98.24%, MCC = 0.961, Sensitivity = 99.11%, Specificity = 96.54%, and Kappa statistics of 0.9606. It is also observed that the J48 Decision Tree classifier with the Genetic Search algorithm for feature selection achieves the accuracy of 98.83%, MCC = 0.974, Sensitivity = 98.95%, Specificity = 98.58%, and Kappa statistics of 0.9735. Furthermore, Multilayer Perceptron ANN classifier with Genetic Search algorithm for feature selection achieves the accuracy of 98.59%, MCC = 0.968, Sensitivity = 98.6%, Specificity = 98.57%, and Kappa statistics of 0.9682.

Highlights

  • Breast cancers are the most frequent cancers among women, according to World Health Organization

  • The paper points out a Hybrid Supervised Machine Learning Classifier System for breast cancer prognosis using feature selection and data imbalance approaches

  • Based on the experimental results, it is evident that the Support Vector Machine with the Particle Swarm Optimization algorithm for feature selection achieves an accuracy of 98.24%, MCC of 0.961, a sensitivity of 99.11%, a specificity of 96.54%, and Kappa statistics of 0.9606

Read more

Summary

Introduction

Breast cancers are the most frequent cancers among women, according to World Health Organization. In metro cities like Mumbai, Delhi, Bangalore breast cancer accounts for 25% to 32% of female cancers This condition becomes more serious because nowadays it became more noticeable in the younger age groups. Once the data imbalance has been handled using the same by applying feature selection algorithms, we can obtain the most important features which play important role in the accuracy of the classification model as well as reduce the computation time. Many such approaches have been proposed and we used nature-inspired algorithms

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call