Abstract

Class prediction based on high-dimensional features has received a great deal of attention in many areas of application. For example, biologists are interested in using microarray gene expression profiles for diagnosis or prognosis of a certain disease (e.g., cancer). For computational and other reasons, it is necessary to select a subset of features before fitting a statistical model, by evaluating how strongly the features are related to the response. However, such a feature selection procedure will result in overconfident predictive probabilities for future cases, because the signal-to-noise ratio in the retained features is exacerbated by the feature selection. In this article we develop a hierarchical Bayesian classification method that can correct for this feature selection bias. Our method, which we term bias-corrected Bayesian classification with selected features (BCBCSF), uses the partial information from the feature selection procedure, in addition to the retained features, to form a correct (unbiased) posterior distribution of certain hyperparameters in the hierarchical Bayesian model that control the signal-to-noise ratio of the dataset. We take a Markov chain Monte Carlo (MCMC) approach to inferring the model parameters. We then use MCMC samples to make predictions for future cases. Because of the simplicity of the models, the inferred parameters from MCMC are easy to interpret, and the computation is very fast. Simulation studies and tests with two real microarray datasets related to complex human diseases show that our BCBCSF method provides better predictions than two widely used high-dimensional classification methods, prediction analysis for microarrays and diagonal linear discriminant analysis. The R package BCBCSF for the method described here is available from http://math.usask.ca/longhai/software/BCBCSF and CRAN.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.