In the context of minimax theory, we propose a new kind of risk, normalized by a random variable, measurable with respect to the data. We present a notion of optimality and a method to construct optimal procedures accordingly. We apply this general setup to the problem of selecting significant variables in Gaussian white noise. In particular, we show that our method essentially improves the accuracy of estimation, in the sense of giving explicit improved confidence sets in L2-norm. Links to adaptive estimation are discussed. 1. Introduction. Searching for significant variables is certainly one of the oldest and most popular problems in statistics. One of the simplest models where the issue of selecting significant variables was first stated mathematically is linear regression. A vast literature has been devoted to this topic since and different approaches have been proposed over the last forty years, both for estimation and for hypothesis testing. Among many authors, we refer to Akaike [1], Breiman and Freedman [3], Chernoff [5], Csiszar and Korner [6], Dychakov [10], Patel [42], Renyi [46], Freidlina [13], Meshalkin [35], Malyutov and Tsitovich [34], Schwarz [47] and Stone [48]. In classical parametric regression, if we consider a linear model, we first have to measure the possible gain of “searching for a limited number of significant variables.” If the model comes from a specific field of application, then only an adequate description together with its solution is relevant. However, from a mathematical point of view, a theory of selecting significant variables does not lead—at least asymptotically—to a substantial improvement of the accuracy of estimation: in a regular parametric model, the classical √ n rate of convergence is not affected by the number of significant variables. (However, even in this setup, let us emphasize that “asymptotically” has to be understood as “up to a constant” and that the correct choice of significant variables may possibly improve this constant.) If instead of a linear model we consider a nonparametric regression model, the search for significant variables becomes crucial for estimating the regression function: the rate of convergence explicitly depends on the set of significant variables. Let us develop this statement with the following example of multivariate regression: suppose we observe Z (n) = (Xi ,Y i ,i = 1 ,... , n)in the model
Read full abstract