Abstract

Recent work proposes a Bayesian hierarchical model for feature selection in which a prior describes the identity of feature sets as well as their underlying class-conditional distribution. In this work, we consider this model under an independence assumption on both feature identities, and the underlying class-conditional distribution of each feature. This framework results in optimal Bayesian feature filtering. Closed form solutions, which are applicable to high dimensional data with low computation cost, can be found. In addition, this model may be used to provide feedback on the quality of any feature set via closed-form estimators of the expected number of good features missed and the expected number of bad features selected. Synthetic data simulations depict outstanding performance of the optimal Bayesian feature filter relative to other popular feature selection methods. We also observe robustness with respect to assumptions in the model, particularly the independence assumption, and robustness under non-informative priors on the underlying class-conditional distributions. Furthermore, application of the optimal Bayesian feature filter on gene expression microarray datasets provides a gene list in which markers with known links to the cancer in question are highly ranked.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call