Abstract

Many bioinformatics studies aim to find features that differentiate between two or more classes. Recent work proposes a Bayesian framework for feature selection that places a prior on the label-conditioned feature distribution. Assuming independent features, the optimal Bayesian filter is obtained and has been solved for Gaussian features. Here we extend the optimal Bayesian filter for categorical data, compare it with several algorithms in synthetic simulations, and apply it to breast and colon cancer SNP datasets. For the real datasets we select the top SNP's, find the genes that map to them, and perform enrichment analysis. Literature review suggests many of the top genes and pathways are involved in cancer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call