A Comparative Analysis of Methods for Detecting and Diagnosing Breast Cancer Based on Data Mining

Ahmed T ,Marwa M Eid,El-Sayed M El ,Alhumaima Ali ,Hussein

doi:10.54216/jaim.040201

Abstract

Breast cancer is a significant public health concern worldwide, and early detection is crucial for its treatment. Although breast cancer has been extensively studied, there is still room for improvement in its classification accuracy. This study aims to improve the classification accuracy of breast cancer by applying information gain feature selection and machine learning techniques to the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The information gain method is utilized to reduce feature characteristics, and machine learning algorithms such as support vector machine (SVM), naive Bayes (NB), and C4.5 decision tree are employed for breast cancer classification. The study also conducts a comparison analysis based on accuracy value. The proposed model achieves maximum classification accuracy (100%) and a weighted average for precision (100%) and recall (100%) using a C4.5 decision tree, while SVM accuracy (98.42%) and weighted average for precision (98.17%) and recall (98.58%) are achieved using a C4.5 decision tree. The NB algorithm attains an accuracy of 96%, with a weighted average for precision (18.57%) and recall (50%). The proposed model's results are compared to similar studies and demonstrate significant progress, indicating new opportunities for breast cancer detection.

Full Text