Abstract

Data Mining is a hot scientific research topic that has many applications in various life aspects. Healthcare and medicine are among those aspects that attracted data mining researchers who sought to solve decision-making problems. Cancer diagnosis, treatment, and prediction are procedures that have been using data mining for decades. In Yemen, some cancer risk factors seem to be different from those in other parts of the world. By mining the data available at National Cancer Control Foundation (NCCF), useful knowledge have been extracted. In this paper, decision tree classification was selected for building a model to predict cancer risk factors. As the NCCF database contained data describe some social life aspects, environmental circumstances, lifestyle, etc., mining those data can contribute in the endeavors of clearing ambiguity about cancer risk factors in Yemen. The informative attributes that were selected for model building included gender, marital status, number of family members, province, chewing Qat, chewing tobacco (Shamaa), smoking, age, relatives with cancer, and cancer class. These data was prepared for Knowledge Data Discovery process. Then, it was prepared for feeding into C4.5 learning algorithms. The results shown that smoking, chewing tobacco (Shamaa), province of residence, marital status, and age are the most important cancer risk factors. The model produced found of high performance. In addition, the rules extracted from the model tree can also be of high value for both people and healthcare sector.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.