Abstract

Prostate cancer is the fourth most common cancer among all cancers and the second most common cancer in men. The rate of increase in prostate cancer incidence is higher than the overall increase of cancer incidents. 68% of prostate cancer cases are from developed countries. There has been very little research on the most suitable techniques for analysing prostate cancer gene expression datasets to identify those genes that may be most related to prostate cancer. This paper attempts to identify significant (influential) attributes in a well-established prostate cancer gene expression dataset consisting of over 12,533 attributes for 102 samples (50 normal, 52 tumour). Several (7) different statistical and artificial intelligence (AI)-based feature selection methods were paired with four different classifiers, namely ANNs, Naive Bayes, AdaBoost and J48. Prediction experiments are carried using ANNs with unseen sample testing. In our experiments, ANNs outperformed all other approaches for classification with sequential forward feature selection (SFFS), achieving 100% accuracy. Naive Bayes and AdaBoost achieved best accuracy of 96.3 and 93.13% with support vector machine (SVM) attribute selection, whereas J48 could get only 89.21% with SFFS approach. For prediction experiments, ANNs obtained an accuracy of 95.1% with SVM attribute selection (correctly predicting 96 out of 102 samples). Finally, by investigating National Center for Biotechnology Information database it is found that 21 out of 24 attributes (87.5%) that belong to SVM attribute selection have a reference to cancer/tumour, thereby establishing a link between feature selection and biological plausibility. The main contribution of this paper is in identifying the importance of pairing the most appropriate feature selection strategy with the most appropriate classification strategy when dealing with significantly underdetermined data. This paper also emphasizes differences and similarities between the influence of classification and prediction of prostate cancer. There is another new approach we considered while doing the classification and prediction experiments. Apart from using 7 different feature selection approaches, we have derived new set of attributes by adding all attributes (union), selecting common attributes (intersection) and rest of the attributes (not common).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call