A Comparative Study of Multilayer Neural Network and C4.5 Decision Tree Models for Predicting the Risk of Breast Cancer

Soolmaz Sohrabi ,Ali Dadashi ,Amir Atashi ,Sina Marashi

doi:10.19187/abc.20185111-14

Abstract

Background: Diagnosing breast cancer at an early stage can have a great impact on cancer mortality. One of the fundamental problems in cancer treatment is the lack of a proper method for early detection, which may lead to diagnostic errors. Using data analysis techniques can significantly help in early diagnosis of the disease. The purpose of this study was to evaluate and compare the efficacy of two data mining techniques, i.e., multilayer neural network and C4.5, in early diagnosis of breast cancer. Methods: A data set from Motamed Cancer Institute's breast cancer research clinic, Tehran, containing 2860 records related to breast cancer risk factors were used. Of the records, 1141 (40%) were related to malignant changes and breast cancer and 1719 (60%) to benign tumors. The data set was analyzed using perceptron neural network and decision tree algorithms, and was split into two a training data set (70%) and a testing data set (30%) using Rapid Miner 5.2. Results: For neural networks, accuracy was 80.52%, precision 88.91%, and sensitivity 90.88%; and for decision tree, accuracy was 80.98%, precision 80.97%, and sensitivity 89.32%. Results indicated that both algorithms have acceptable capabilities for analyzing breast cancer data. Conclusion: Although both models provided good results, neural network showed more reliable diagnosis for positive cases. Data set type and analysis method affect results. On the other hand, information about more powerful risk factors of breast cancer, such as genetic mutations, can provide models with high coverage.

Full Text