Abstract

The advancement of medical technology has increased the amount of medical data. Data mining has become an essential tool for hospital management and medical research because it enables people to manage accumulated data and discern possible meaningful information from databases. Further medical research can thus use the important data. The current study compared 3 data mining methods, and attempted to identify the determining factors of breast cancer. The data were collected from a hospital in Taiwan and had been gathered between 2002 and 2010. Our study examined 1357 cases and 7 variables. The dataset was divided into 10 categories of training sets and testing sets. Three popular data mining algorithms (decision tree C5.0, SVM, and logistic regression) were used to predict the patients’ survival and death rates. The results showed that decision tree C5.0 outperformed both SVM and logistic regression. For training sets, the decision tree C5.0 achieved a classification accuracy of 95.8% with a sensitivity of 97.7% and a specificity of 94.7%. For testing sets, the decision tree C5.0 achieved a classification accuracy of 94.9% with a sensitivity of 95.7% and a specificity of 94.3%. The results suggested that decision tree C5.0 is the best model for prognosis in clinical practice. Our findings may provide a reference for doctors to identify new cases of breast cancer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call