Abstract

ABSTRACT This paper presents a new method for the feature screening of ultra-high dimensional data with response missing at random. The distribution function of the missing response is completed by imputation technology, and then the distance correlation between the imputed distribution function of response and the distribution function of covariate is used as an index for feature screening. The proposed method has the following advantages. First, it is a nonparametric model-free method, and can detect the nonlinear relationship between variables. Second, it is robust to covariates with outliers and heavy-tailed distributions. Third, it can deal with multi-dimensional response variables directly. Under certain assumptions, this paper demonstrates the sure screening and ranking consistency properties. Simulation studies are conducted to examine the performance of the proposed procedure and to compare with existing methods. Finally, our method is applied to the data analysis of diffuse large B-cell lymphoma.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call