Abstract

Ensemble methods are powerful techniques used in machine learning to improve the prediction accuracy of classifier learning systems. In this study, different ensemble learning methods for lung cancer survival prediction were evaluated on the Surveillance, Epidemiology, and End Results (SEER) dataset. Data were preprocessed in several steps before applying classification models. Five popular ensemble methods, Bagging, Dagging, AdaBoost, MultiBoosting and Random SubSpace, and eight classification algorithms, RIPPER, Decision Stump, Simple Cart, C4.5, SMO, Logistic Regression, Bayes Net and Random Forest, as base classifiers were evaluated for lung cancer survival prediction. Then, risk of mortality after 5 years of diagnosis has been estimated. The prediction performance is measured in terms of accuracy and area under ROC curve (AUC). AdaBoost Algorithm had the best efficiency in increasing base classifiers performance in comparison to other four ensemble methods. It increased the accuracy of RIPPER from 88.88% to 88.98%, the accuracy of decision stump algorithm from 81.21% to 87.67% and the accuracy of SMO algorithm from 83.41% to 87.16%. AdaBoost algorithm also increased the AUC of RIPPER from 91.5% to 94.9%, the AUC of decision stump algorithm from 81.2% to 93.9%, the AUC of J48 algorithm from 94.1% to 94.9% and the AUC of SMO algorithm from 50.0% to 92.1%. Random subspace algorithm was the worst method in comparison to other ensemble techniques used in this study. The results empirically showed that ensemble methods are able to evaluate the performance of their base classifiers and they are appropriate methods for analysis of cancer survival.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call