Abstract

Cancer class prediction and discovery is beneficial to imperfect non-automated cancer diagnoses which affect patient cancer treatments. Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is expected to contribute significantly to the progress in cancer treatment by enabling an automatic, precise and early diagnosis. A promising application of SAGE gene expression data is classification of cancers. In this paper, we build three event models (the multivariate Bernoulli model, the multinomial model and the normalized multinomial model) for SAGE gene expression profiles. The event models based methods are compared with the standard Naive Bayes method. Both binary classification and multicategory classification are investigated. Experiments results on several SAGE datasets show that event models are better than standard Naive Bayes in general. Normalized Information Gain (NIG), an extension of Information Gain (IG), is proposed for gene selection. The impact of gene correlation on the classification performance is investigated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call