An efficient ensemble-based Machine Learning for breast cancer detection

Ramdas Kapila,Sumalatha Saleti

doi:10.1016/j.bspc.2023.105269

Abstract

Breast cancer is a very severe type of cancer that often develops in breast cells. Attempting to develop an effective predictive model for breast cancer prognosis prediction is urgently needed despite substantial advancements in the management of symptomatic breast cancer over the past ten years. The precise prediction will offer numerous advantages, including the ability to diagnose cancer at an early stage and protect patients from needless medical care and related costs. In the medical field, recall is just as important as model accuracy. Even more crucially in the medical area, a model is not very good if its accuracy is high but its recall is low. To boost accuracy while still assigning equal weight to recall, we proposed a model that ensembles Feature Selection (FS), Feature Extraction (FE), and 5 Machine Learning (ML) models. There are three steps in our proposed model. The Correlation Coefficient (CC) and Anova (Anv) feature selection methodologies to choose the features in the first stage. Applying Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbour Embedding (t-SNE), and Principal Component Analysis (PCA) to extract the features in the second stage without compromising the crucial information. With 5 ML models and ensemble models such as Voting Classifier (VC) and Stacking Classifier (SC) after selecting and extracting features from the dataset to predict the disease will be the last stage. The results show that the proposed model CC-Anv with PCA using a SC outperformed all the existing methodologies with 100% accuracy, precision, recall, and f1-score.

Full Text