Asymptotic optimality of generalized C, cross-validation, and generalized cross-validation in regression with heteroskedastic errors

Donald W.K. Andrews

doi:10.1016/0304-4076(91)90107-o

Abstract

Abstract The problem considered here is that of using a data-driven procedure to select a good estimate from a class of linear estimates indexed by a discrete parameter. In contrast to other papers on this subject, we consider models with heteroskedastic errors. The results apply to model selection problems in linear regression and to nonparametric regression estimation via series estimators, nearest-neighbor estimators, and local regression estimators, among others. Generalized C L ( GC L ), cross-validation ( CV ), and generalized cross-validation ( GCV ) procedures are analyzed. The GC L and CV criteria are shown to be asymptotically optimal under general conditions. A feasible version of GC L , however, is available only in some applications. The GCV criterion is found to be asymptotically optimal only under a condition that is satisfied in some applications but not in others. For example, it is satisfied in the nearest-neighbor estimation context but not in the series estimation, local regression estimation, or model selection contexts. Thus, the CV criterion is the only feasible criterion of the three that is asymptotically optimal under general conditions. The proofs rely heavily on results of Li (1987).

Full Text