Abstract

Obtaining excellent prediction accuracy in the high-dimensional partially linear model is particularly important. However, it is difficult to achieve due to the complex relationship between nonparametric covariates and the response. Irrelevant covariates and unimportant data also commonly attenuate the prediction performance of the model. Further, high-dimensional data analysis is challenging for modern statistical studies. To overcome these difficulties, we propose the Double Sparsity Garrotized Kernel Machine (DSGKM) method with an efficient algorithm and its adjusted version for prediction. Specifically, we estimate the nonparametric components using the kernel machine technique, and impose L 1 -norm penalties simultaneously to select relevant covariates and retain the representative data in the final model. Besides, the convergence analysis of the adjusted algorithm is conducted. The advantages of our method are: (i) to sufficiently capture the complex relationship between nonparametric covariates and the response; (ii) to identify relevant covariates and select representative data; and (iii) to achieve higher computational efficiency, especially the situations when both parametric and nonparametric components are high-dimensional. Results on both simulated and real data show that the proposed method outperforms existing methods, even when outliers exist.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call