Abstract

Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

Highlights

  • Breast cancer prediction has long been regarded as an important research problem in the medical and healthcare communities

  • The best performances are obtained by genetic algorithm (GA) + linear support vector machines (SVM) for classification accuracy (96.85%), GA + linear SVM for receiver operating characteristic (ROC) (0.967), and GA + radial basis function (RBF) SVM for F-measure (0.988)

  • There is a large reduction in the computational times for training the SVM classifiers after performing feature selection compared with the baseline SVM classifiers without doi:10.1371/journal.pone.0161501.g003

Read more

Summary

Introduction

Breast cancer prediction has long been regarded as an important research problem in the medical and healthcare communities. This cancer develops in the breast tissue [1]. There are several risk factors for breast cancer including female sex, obesity, lack of physical exercise, drinking alcohol, hormone replacement therapy during menopause, ionizing radiation, early age at first menstruation, having children late or not at all, and older age. There are different types of breast cancer, with different stages or spread, aggressiveness, and genetic makeup. It would be very useful to have a system that would allow early detection and prevention which would increase the survival rates for breast cancer

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call