Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis.

Sara Ibrahim,Sergio A Velastin,Saima Nazir

doi:10.3390/jimaging7110225

Abstract

Breast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better performance are very important for personalized care and treatment and to reduce and control the recurrence of cancer. The main objective of this research was to select feature selection techniques using correlation analysis and variance of input features before passing these significant features to a classification method. We used an ensemble method to improve the classification of breast cancer. The proposed approach was evaluated using the public WBCD dataset (Wisconsin Breast Cancer Dataset). Correlation analysis and principal component analysis were used for dimensionality reduction. Performance was evaluated for well-known machine learning classifiers, and the best seven classifiers were chosen for the next step. Hyper-parameter tuning was performed to improve the performances of the classifiers. The best performing classification algorithms were combined with two different voting techniques. Hard voting predicts the class that gets the majority vote, whereas soft voting predicts the class based on highest probability. The proposed approach performed better than state-of-the-art work, achieving an accuracy of 98.24%, high precision (99.29%) and a recall value of 95.89%.

Highlights

We evaluated the performances of logistic regression, support vector machine, k-nearest neighbors, stochastic gradient descent learning, naïve Bayes, random forest, and decision tree
The results show that support vector machine (SVM) outperformed both the decision tree and the MLP
Dichotomiser 3 (ID3) trees with no pruning. It makes a final prediction based on the mean of each prediction, and it tends to be robust to overfitting, mainly because it takes the average of all the predictions, which cancels out biases

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Benign: If the cells are not cancerous, the tumor is benign (not dangerous to health) It will not invade nearby tissues or spread to other areas of the body (metastasize). Some cancer cells can move into the bloodstream or lymph nodes, where they can spread to other tissues within the body, which is known as metastasis This is a tumor that is more dangerous and causes death. Invasive ductal carcinoma (IDC): It begins in the milk duct and can spread to the surrounding breast tissues It is the most common type of breast cancer. The work proposed here highlights the significance of the use of the best performing machine learning classifiers with ensembles techniques for accurate diagnosis of breast cancer.

Literature Review

Methodology

Data Pre-Processing

Dimensionality Reduction Using Correlation Analysis

Dimensionality Reduction Using Principal Component Analysis

Feature Selection by Using a Wrapper Subset Selection Method

Breast Cancer Tumor Classification

NaïVe Bayes Classification

Decision Tree

The Random Decision Forest Method

Ensemble Classification

Experimentation and Discussion

Results and Discussion

Comparison with Existing Work

Conclusions and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of imaging	Publication Date: Oct 26, 2021
Citations: 28	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of imaging

Lead the way for us

Similar Papers

A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
Temidayo Oluwatosin Omotehinwa ... Emmanuel Gbenga Dada
Healthcare Analytics | VOL. 4
Temidayo Oluwatosin Omotehinwa, et. al.Temidayo Oluwatosin Omotehinwa ... Emmanuel Gbenga Dada
23 Jun 2023
Healthcare Analytics | VOL. 4

Breast Cancer Detection Using Machine Learning Techniques
Sarthak Vyas ... Noman Ansari
International Journal for Research in Applied Science and Engineering Technology | VOL. 10
Sarthak Vyas, et. al.Sarthak Vyas ... Noman Ansari
31 May 2022
International Journal for Research in Applied Science and Engineering Technology | VOL. 10

RF-PCA: A New Solution for Rapid Identification of Breast Cancer Categorical Data Based on Attribute Selection and Feature Extraction.
Kai Bian ... Mengran Zhou
Frontiers in Genetics | VOL. 11
Kai Bian, et. al.Kai Bian ... Mengran Zhou
09 Sep 2020
Frontiers in Genetics | VOL. 11

Improving Breast Cancer Diagnosis Accuracy by Particle Swarm Optimization Feature Selection
Reihane Kazerani
International Journal of Computational Intelligence Systems | VOL. 17
Reihane KazeraniReihane Kazerani
13 Mar 2024
International Journal of Computational Intelligence Systems | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of imaging