Abstract
We consider a Gaussian sequence space model $X_{\lambda}=f_{\lambda}+\xi_{\lambda},$ where the noise variables $(\xi_{\lambda})_{\lambda}$ are independent, but with heterogeneous variances $(\sigma_{\lambda}^{2})_{\lambda}$. Our goal is to estimate the unknown signal vector $(f_{\lambda})$ by a model selection approach. We focus on the situation where the non-zero entries $f_{\lambda}$ are sparse. Then the heterogenous case is much more involved than the homogeneous model where $\sigma_{\lambda}^{2}=\sigma^{2}$ is constant. Indeed, we can no longer profit from symmetry inside the stochastic process that one needs to control. The problem and the penalty do not only depend on the number of coefficients that one selects, but also on their position. This appears also in the minimax bounds where the worst coefficients will go to the larger variances. With a careful and explicit choice of the penalty, however, we are able to select the correct coefficients and get a sharp non-asymptotic control of the risk of our procedure. Some finite sample results from simulations are provided.
Highlights
Λ is a finite, but large index set. This heterogeneous model may appear in several frameworks where the variance is fluctuating, for example in heterogeneous regression, coloured noise, fractional Brownian motion models or especially in statistical inverse problems
The goal here is to estimate the unknown parameter vector from the observations (Xλ) under general and unknown sparsity constraints. To this end a penalised empirical risk criterion, based on the so-called risk hull approach, is proposed for general families of possibly data-driven selection rules. This can be viewed as a model selection procedure and results in a sparse oracle-type inequality
The potential loss of the factor 2 in the heterogeneous framework is possibly avoidable in theory, but in simulations the results seem comparably less sensitive to this factor than to other modifications, e.g. to how many data points, among the nγn non-zero coefficients, are close to the critical threshold level, which defines some kind of effective sparsity of the problem. This effect is not treated in the theoretical setup in most of the false discovery rate control (FDR)-related studies, where implicitly a worst case scenario of the coefficients’ magnitude is understood
Summary
Λ is a finite, but large index set This heterogeneous model may appear in several frameworks where the variance is fluctuating, for example in heterogeneous regression, coloured noise, fractional Brownian motion models or especially in statistical inverse problems. The goal here is to estimate the unknown parameter vector (fλ) from the observations (Xλ) under general and unknown sparsity constraints To this end a penalised empirical risk criterion, based on the so-called risk hull approach, is proposed for general families of possibly data-driven selection rules. This can be viewed as a (data-dependent) model selection procedure and results in a sparse oracle-type inequality.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.