Abstract
For ultrahigh-dimensional data, variable screening is an important step to reduce the scale of the problem, hence, to improve the estimation accuracy and efficiency. In this paper, we propose a new dependence measure which is called the log odds ratio statistic to be used under the sufficient variable screening framework. The sufficient variable screening approach ensures the sufficiency of the selected input features in modeling the regression function and is an enhancement of existing marginal screening methods. In addition, we propose an ensemble variable screening approach to combine the proposed fused log odds ratio filter with the fused Kolmogorov filter to achieve supreme performance by taking advantages of both filters. We establish the sure screening properties of the fused log odds ratio filter for both marginal variable screening and sufficient variable screening. Extensive simulations and a real data analysis are provided to demonstrate the usefulness of the proposed log odds ratio filter and the sufficient variable screening procedure.
Highlights
Ultrahigh-dimensional data have emerged recently in many areas of modern scientific research, including microarray, genomic, proteomic, brain images and genetic data
Under the assumption that only a small number of variables, which are usually referred as active features, among all observed input features contribute to the response variable, [10] propose the sure independent screening (SIS) method to identify a subset of features that contains the active features
We show that the fused log odds ratio filter enjoys sure screening properties for both marginal screening and sufficient variable screening
Summary
Ultrahigh-dimensional data have emerged recently in many areas of modern scientific research, including microarray, genomic, proteomic, brain images and genetic data. We show that the proposed log odds ratio statistic can be used for variable screening and the log odds ratio filter is fully nonparametric and model-free It is invariant under monotone transformation on features. The log odds ratio filter can be applied to the data where the response variable and the input features are either discrete or continuous Owning their advantages over different situations, the proposed fused log odds ratio filter can be combined with the fused Kolmogorov filter as a complement to each other to achieve better performance under an ensemble approach. Additional remarks and technical proofs are included in the appendix
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.