Abstract

Background and ObjectiveMicro or macro-level mapping of cancer statistics is a challenging task that requires long-term planning, prospective studies and continuous monitoring of all cancer cases. The objective of the current study is to present how cancer registry data could be processed using data mining techniques in order to improve the statistical analysis outcomes. MethodsData were collected from the Cancer Registry of Crete in Greece (counties of Rethymno and Lasithi) for the period 1998–2004. Data collection was performed on paper forms and manually transcribed to a single data file, thus introducing errors and noise (e.g. missing and erroneous values, duplicate entries etc.). Data were pre-processed and prepared for analysis using data mining tools and algorithms. Feature selection was applied to evaluate the contribution of each collected feature in predicting patients’ survival. Several classifiers were trained and evaluated for their ability to predict survival of patients. Finally, statistical analysis of cancer morbidity and mortality rates in the two regions was performed in order to validate the initial findings. ResultsSeveral critical points in the process of data collection, preprocessing and analysis of cancer data were derived from the results, while a road-map for future population data studies was developed. In addition, increased morbidity rates were observed in the counties of Crete (Age Standardized Morbidity/Incidence Rates ASIR= 396.45 ± 2.89 and 274.77 ±2.48 for men and women, respectively) compared to European and world averages (ASIR= 281.6 and 207.3 for men and women in Europe and 203.8 and 165.1 in world level). Significant variation in cancer types between sexes and age groups (the ratio between deaths and reported cases for young patients, less than 34 years old, is at 0.055 when the respective ratio for patients over 75 years old is 0.366) was also observed. ConclusionsThis study introduced a methodology for preprocessing and analyzing cancer data, using a combination of data mining techniques that could be a useful tool for other researchers and further enhancement of the cancer registries.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.