Abstract
Many statistical modeling procedures involve one or more tuning parameters tocontrol the model complexity. These tuning parameters can be the bandwidth in thekernel smoothing method in the nonparametric regression and density estimation orbe the regularization parameter in the regularization method for feature selectionin the high dimensional modeling. Tuning parameter selection plays critical rolesin the statistical modeling and machine learning. For the massive data analysis,commonly-used methods such as grid-point search with information criteriabecome prohibitively costly in computation. Their feasibility isquestionable even with modern parallel computing platforms.This paper aims to develop a fast algorithm to efficientlyapproximate the best tuning parameters. The algorithm entails (a) assuming aparametric model to describe the trend between the best tuning parameters andsample sizes, (b) establishing the trend via fitting the model with subsamplingdata, and (c) extrapolating this trend to the case of huge sample size. Todetermine the subsampling sample sizes to be taken, we derive optimaldesigns for settings that allow a constraint on the budget of total computational cost.We show that the proposed designs possess an asymptotic optimality 性质.Our numerical studies demonstrate that with a simple two-parameter polynomial model, the proposed algorithm performsalmost equivalently to the procedure using the full data setin several different statistical settings, while ithas a significant reduction in computing time and storage.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.