Abstract
Variable screening as a fast and effective dimension reduction tool plays an important role in analyzing ultrahigh dimensional data. While a very large number of actual datasets contain both continuous and categorical variables, existing methods are mostly designed for continuous data. Partial sufficient variable screening, which aims to reduce the predictive set of primary interest without loss of regression information in the presence of some control variables, is proposed with theoretical guarantees. Specifically, for regression analyses involving mixed types of predictors, variable screening is approached under the notion of sufficiency by constraining the reduction of the continuous variables through the subpopulations identified by the categorical variables. The effectiveness of the proposed method is demonstrated through simulation studies encompassing a variety of regression and classification models, and an application in prognostic gene screening for diffuse large-B-cell lymphoma.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.