Abstract

Colorectal cancer data available from the SEER program is analyzed with the aim of using filtering techniques to improve the performance of association rule models. In this paper, it is proposed to improve the quality of the dataset by removing its outliers using the Hidden Naive Bayes (HNB), Naive Bayes Tree (NBTree ) and Reduced Error Pruning Decision Tree (REPTree) algorithms. The Apriori and HotSpot algorithms are applied to mine the association rules between the 13 selected attributes and average survivals. Experimental results show that the HNB algorithm can improve the accuracy of the Apriori algorithm’s performance by up to 100% and support threshold up to 45%. It can also improve the accuracy of the HotSpot algorithm’s performance up to 93.38% and support threshold up to 80%. Therefore, the HotSpot rules with minimum support of 80% are selected for explanation. The HotSpot algorithm shows that colorectal cancer patients, who died from colon cancer and were not receiving radiation therapy, were associated with survival of less than 22 months. Our study shows that filtering techniques in the preprocessing stage are a useful approach in enhancing the quality of the data set. This finding could help researchers build models for better prediction and performance analysis. Although it is heuristic, such analysis can be very useful to identify the factors affecting survival. It can also aid medical practitioners in helping patients to understand risks involved in a particular treatment procedure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call