Abstract

We consider a Gaussian sequence space model $X_{\lambda}=f_{\lambda}+\xi_{\lambda},$ where the noise variables $(\xi_{\lambda})_{\lambda}$ are independent, but with heterogeneous variances $(\sigma_{\lambda}^{2})_{\lambda}$. Our goal is to estimate the unknown signal vector $(f_{\lambda})$ by a model selection approach. We focus on the situation where the non-zero entries $f_{\lambda}$ are sparse. Then the heterogenous case is much more involved than the homogeneous model where $\sigma_{\lambda}^{2}=\sigma^{2}$ is constant. Indeed, we can no longer profit from symmetry inside the stochastic process that one needs to control. The problem and the penalty do not only depend on the number of coefficients that one selects, but also on their position. This appears also in the minimax bounds where the worst coefficients will go to the larger variances. With a careful and explicit choice of the penalty, however, we are able to select the correct coefficients and get a sharp non-asymptotic control of the risk of our procedure. Some finite sample results from simulations are provided.

Highlights

  • Λ is a finite, but large index set. This heterogeneous model may appear in several frameworks where the variance is fluctuating, for example in heterogeneous regression, coloured noise, fractional Brownian motion models or especially in statistical inverse problems

  • The goal here is to estimate the unknown parameter vector from the observations (Xλ) under general and unknown sparsity constraints. To this end a penalised empirical risk criterion, based on the so-called risk hull approach, is proposed for general families of possibly data-driven selection rules. This can be viewed as a model selection procedure and results in a sparse oracle-type inequality

  • The potential loss of the factor 2 in the heterogeneous framework is possibly avoidable in theory, but in simulations the results seem comparably less sensitive to this factor than to other modifications, e.g. to how many data points, among the nγn non-zero coefficients, are close to the critical threshold level, which defines some kind of effective sparsity of the problem. This effect is not treated in the theoretical setup in most of the false discovery rate control (FDR)-related studies, where implicitly a worst case scenario of the coefficients’ magnitude is understood

Read more

Summary

Motivation and main results

Λ is a finite, but large index set This heterogeneous model may appear in several frameworks where the variance is fluctuating, for example in heterogeneous regression, coloured noise, fractional Brownian motion models or especially in statistical inverse problems. The goal here is to estimate the unknown parameter vector (fλ) from the observations (Xλ) under general and unknown sparsity constraints To this end a penalised empirical risk criterion, based on the so-called risk hull approach, is proposed for general families of possibly data-driven selection rules. This can be viewed as a (data-dependent) model selection procedure and results in a sparse oracle-type inequality.

Examples
Data-driven-subset selection
Discussion
Minimax bounds
A numerical example
Proofs
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call