Colon cancer is one of the most common spread cancers in the world, which leads to total death of 10%. Prediction of onset of cancer, and the cause of its development in these patients can be of an enormous help and relief to those affected, as they can get back their “normal” life. Data mining and machine learning are important intelligent tools for classification, prediction and hidden relation extraction between patient information. We collected data from Shahid Faghihi Hospital in Shiraz. Features collected are as follows: Gender, age, duration of cancer before surgery, number of times the patients used bathroom, taking anti-inflammatory drug prednisolone, duration of drug use and dosage, kind of surgery and number of times consulted and retreatment of surgery, incontinence, etc. After pre-processing and data cleaning stages, effective features were extracted, and also occurrence of cancer predicts by using different classification algorithms. Then association rule mining algorithms like Apriori were used for obtaining any internal hidden relation between entries. Approaching them with different algorithms and assessing them with support vector machine was with highest prediction accuracy (84%). Due to unbalanced dataset, we chose cost sensitive support vector machine. In another aspect, after applying Apriori algorithm, the conditions of non-inflammation were extracted based on dataset features. Some significant outcomes are in what follows. If surgery treatment or diagnosed was less than 5 years, the possibility of developing colon cancer is lower. Also, as the duration of disease increases, the possibility of reoperation increases, as confirmed by the interiors. Since this issue with these features was raised for the first time in this paper at the suggestion of internists, early detection of cancer and also the extraction of effective laws can be of help to the medical community. In future, to get higher accuracy, the improvement of the dataset in terms of number of samples and colonoscopy image features is considered.
Read full abstract