Abstract

Background: Breast cancer is the first cause of cancer-related deaths among women in Iran. Objectives: The aim of the present study was to compare the traditional statistical analysis and data mining technique as the research methods for identifying the prognostic factors regarding the survival time of patients with breast cancer. Decision tree method is one of the predictive models that used in the medical field. The most used algorithms are classification and regression trees (CART), the quick, unbiased, efficient statistical tree (QUEST), Chi-square automatic interaction detector (CHAIDs) algorithm, and the C5.0 algorithm. Methods: We used data for 438 patients, who were referred to cancer research center in Shahid Beheshti University of Medical Sciences. The patients were visited and treated during 1992 to 2012 and followed up until October 2014. The data were analyzed by regression logistic and decision tree method. Six measures for evaluation of predictive performance of different models were used. Results: The C5.0 algorithm performed better than CHAID, QUEST, CART algorithms, and the logistic regression in predicting breast cancer survival. The multiple logistic regression results indicated that the factors of age at diagnosis, histologic grade, axillary lymph node status, and type of surgery were statistically significant with regard to the probability of death in patients with breast cancer. Moreover, based on C4.5 they reported that tumor size, age of menarche, hormonal therapy, axillary nodal status, and histological grade are the most prominent variables. Conclusions: The more precise methods can identify the more accurate predictors. The decision tree method was able to predict the probability of death more accurately compared with the conventional logistic regression. Some improvements for classical classification tree such as boosting and bagging have been developed in order to obtain better predictive performance. We suggest that the modern classification tree method in the breast cancer context be the focus of future studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call