Abstract

Cancer is one of the leading causes of death around the world. Finding the risk factors related to different types of cancer can help researchers understand the process of cancer development and find new ways of preventing the disease. Most of the researches done on cancer datasets focus only one type of cancer. This research aims to provide a new methodology for extracting significant influential factors affecting multiple cancer types by employing frequent pattern mining, association rule mining, and contrast set mining techniques. The datasets used are US general population collected from the National Health Interview Survey (NHIS) and the Surveillance, Epidemiology, and End Results (SEER) Program. The rules discovered have invaluable contribution in two aspects: some of the rules validate the existing knowledge about cancer and a few of them expand further research scope to enrich expert knowledge in cancer domain. Experimental results illustrate that high cholesterol and high blood pressure are evident among cancer patients. Considering the demographic facts, female and the age group between 61 and 85 are more prone to cancer. Also, the Hispanic origin “not Hispanic/Spanish origin” are the majority among cancer patients. This research is one of the few works that implies to diverse cancer domain and unique in methodology for finding dominant factors associated with cancer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call