Abstract

Breast cancer is a prevalent disease that affects mostly women, and early diagnosis will expedite the treatment of this ailment. Recently, machine learning (ML) techniques have been employed in biomedical and informatics to help fight breast cancer. Extracting information from data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and feature extraction techniques has significantly changed the whole process of a breast cancer diagnosis. This research work proposed a machine learning model for the classification of breast cancer. To achieve this, a support vector machine (SVM) was employed for the classification, and linear discriminant analysis (LDA) was employed for feature extraction. We measured our model’s feature extraction performance in principal component analysis (PCA) and random forest for classification. A comparative analysis of the proposed model was performed to show the effectiveness of the feature extraction, and we computed missing values based on the classifier’s accuracy, precision, and recall. The original Wisconsin Breast Cancer dataset (WBCD) and Wisconsin Prognostic Breast Cancer dataset (WPBC) were used. We evaluated performance in two phases: In phase 1, rows containing missing values were computed using the mean, and in phase 2, rows containing missing values were computed using the median. LDA–SVM when median was used to compute missing values has better results, with accuracy of 99.2%, recall of 98.0% and precision of 98.0% on the WBCD dataset and an accuracy of 79.5%, recall of 76.0% and precision of 59.0% on the WPBC dataset. The SVM classifier had a better performance in handling classification problems when LDA was applied and the median was used as a method for computing missing values.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call