Abstract

Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features. The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer.

Highlights

  • Cancer is a group of diseases that are characterized by the uncontrollable spread and growth of abnormal cells [1,2]

  • The conventional clinical diagnosis process of diseases is often associated with uncertainty and ambiguity due to complexity and fuzziness in the course of diagnosis of most of the deadly diseases, such as breast cancer, coronary artery diseases, diabetes, waterborne diseases, among others [20,22,23]

  • Using a record of the healthy, mild, and severe cases of breast cancer on the 300 randomly selected instances of the wisconsin diagnostic breast cancer dataset, the null hypothesis that there is no significant accuracy achieved by fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for diagnosis of breast cancer is rejected, since the p-value obtained from the test, p-value < 0.0001, is less than the significance level, 0.05

Read more

Summary

Introduction

Cancer is a group of diseases that are characterized by the uncontrollable spread and growth of abnormal cells [1,2]. Most of the datasets contain many features, which many of them might be insignificant if not irrelevant for diagnosis of diseases [34,35,36] If these insignificant features are not removed, they might cause a heavier burden on the overlay of the network nodes of the fuzzy neural network system, which would reduce the diagnosis accuracy, increase the time needed for the training, and make the interpretation of diagnostic results of the system very difficult to be understood [37,38,39,40]. The motivation of this work is to develop a fuzzy neural network-based expert for early diagnosis of breast cancer in Saudi Arabia that would address the uncertainty and ambiguity associated with the diagnosis process of the disease and the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease

Related Work
Classification Models with Full Features
Classification Models with Selected Fittest Features
Selection of the Size of Training Dataset
Input Layer
Fuzzy Rules Layers
Inference Layer
Defuzzification Layer
Classification Models with full features
Statistical Testing
Conclusions
Findings
Limitation and Future Direction
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call