Abstract

Many statistical modeling procedures involve one or more tuning parameters tocontrol the model complexity. These tuning parameters can be the bandwidth in thekernel smoothing method in the nonparametric regression and density estimation orbe the regularization parameter in the regularization method for feature selectionin the high dimensional modeling. Tuning parameter selection plays critical rolesin the statistical modeling and machine learning. For the massive data analysis,commonly-used methods such as grid-point search with information criteriabecome prohibitively costly in computation. Their feasibility isquestionable even with modern parallel computing platforms.This paper aims to develop a fast algorithm to efficientlyapproximate the best tuning parameters. The algorithm entails (a) assuming aparametric model to describe the trend between the best tuning parameters andsample sizes, (b) establishing the trend via fitting the model with subsamplingdata, and (c) extrapolating this trend to the case of huge sample size. Todetermine the subsampling sample sizes to be taken, we derive optimaldesigns for settings that allow a constraint on the budget of total computational cost.We show that the proposed designs possess an asymptotic optimality 性质.Our numerical studies demonstrate that with a simple two-parameter polynomial model, the proposed algorithm performsalmost equivalently to the procedure using the full data setin several different statistical settings, while ithas a significant reduction in computing time and storage.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call