Abstract
The small-n-large-P situation has become common in genetics research, medical studies, risk management, and other fields. Feature selection is crucial in these studies yet poses a serious challenge. The traditional criteria such as AIC, BIC, and cross-validation choose too many features. In this paper, we examine the variable selection problem under the generalized linear models. We study the approach where a prior takes specific account of the small-n-large-P situation. The criterion is shown to be variable selection consistent under generalized linear models. We also report simulation results and a data analysis to illustrate the effectiveness of EBIC for feature selection. In many scientific investigations, researchers explore the relationship between a response variable and some explanatory features through a random sample. Ex- amples of such features include disease genes and quantitative trait loci in the human genome, biomarkers responsible for disease pathways, and stocks gener- ating profits in investment portfolios. The selection of causal features is a crucial aspect in this. When the sample size n is relatively small but the number of features P under consideration is extremely large, there is a serious challenge to the selection of causal features. Feature selection in the sense of identifying causal features is different from, but often interwoven with, model selection; the latter involves two operational components: a procedure for selecting candidate models, and a criterion for assessing the candidate models. In this article, we concentrate on the issue of model selection criteria. Traditional model selection criteria such as Akaike's information criterion (AIC) (Akaike (1973)), cross-validation (CV) (Stone (1974)) and generalized cross-validation (GCV) (Craven and Wahba (1979)) essentially address the pre- diction accuracy of selected models. The popular Bayes information criterion (BIC) (Schwarz (1978)) was developed from the Bayesian paradigm in a differ- ent vein. BIC approximates the posterior model probability when the prior is
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.