Abstract

Variable screening as a fast and effective dimension reduction tool plays an important role in analyzing ultrahigh dimensional data. While a very large number of actual datasets contain both continuous and categorical variables, existing methods are mostly designed for continuous data. Partial sufficient variable screening, which aims to reduce the predictive set of primary interest without loss of regression information in the presence of some control variables, is proposed with theoretical guarantees. Specifically, for regression analyses involving mixed types of predictors, variable screening is approached under the notion of sufficiency by constraining the reduction of the continuous variables through the subpopulations identified by the categorical variables. The effectiveness of the proposed method is demonstrated through simulation studies encompassing a variety of regression and classification models, and an application in prognostic gene screening for diffuse large-B-cell lymphoma.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call