Sparse model selection under heterogeneous noise: Exact penalisation and data-driven thresholding

Laurent Cavalier,Markus Reiß

doi:10.1214/14-ejs889

Abstract

We consider a Gaussian sequence space model $X_{\lambda}=f_{\lambda}+\xi_{\lambda},$ where the noise variables $(\xi_{\lambda})_{\lambda}$ are independent, but with heterogeneous variances $(\sigma_{\lambda}^{2})_{\lambda}$. Our goal is to estimate the unknown signal vector $(f_{\lambda})$ by a model selection approach. We focus on the situation where the non-zero entries $f_{\lambda}$ are sparse. Then the heterogenous case is much more involved than the homogeneous model where $\sigma_{\lambda}^{2}=\sigma^{2}$ is constant. Indeed, we can no longer profit from symmetry inside the stochastic process that one needs to control. The problem and the penalty do not only depend on the number of coefficients that one selects, but also on their position. This appears also in the minimax bounds where the worst coefficients will go to the larger variances. With a careful and explicit choice of the penalty, however, we are able to select the correct coefficients and get a sharp non-asymptotic control of the risk of our procedure. Some finite sample results from simulations are provided.

Highlights

Λ is a finite, but large index set. This heterogeneous model may appear in several frameworks where the variance is fluctuating, for example in heterogeneous regression, coloured noise, fractional Brownian motion models or especially in statistical inverse problems
The goal here is to estimate the unknown parameter vector from the observations (Xλ) under general and unknown sparsity constraints. To this end a penalised empirical risk criterion, based on the so-called risk hull approach, is proposed for general families of possibly data-driven selection rules. This can be viewed as a model selection procedure and results in a sparse oracle-type inequality
The potential loss of the factor 2 in the heterogeneous framework is possibly avoidable in theory, but in simulations the results seem comparably less sensitive to this factor than to other modifications, e.g. to how many data points, among the nγn non-zero coefficients, are close to the critical threshold level, which defines some kind of effective sparsity of the problem. This effect is not treated in the theoretical setup in most of the false discovery rate control (FDR)-related studies, where implicitly a worst case scenario of the coefficients’ magnitude is understood

Summary

Motivation and main results

Λ is a finite, but large index set This heterogeneous model may appear in several frameworks where the variance is fluctuating, for example in heterogeneous regression, coloured noise, fractional Brownian motion models or especially in statistical inverse problems. The goal here is to estimate the unknown parameter vector (fλ) from the observations (Xλ) under general and unknown sparsity constraints To this end a penalised empirical risk criterion, based on the so-called risk hull approach, is proposed for general families of possibly data-driven selection rules. This can be viewed as a (data-dependent) model selection procedure and results in a sparse oracle-type inequality.

Examples

Data-driven-subset selection

Discussion

Minimax bounds

A numerical example

Proofs

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2014
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Sparse model selection under heterogeneous noise: Exact penalisation and data-driven thresholding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

A Comparison of Mixed‐Model Analyses of the Iowa Crop Performance Test for Corn
Yoon‐Sup So ... Jode Edwards
Crop Science | VOL. 49
Yoon‐Sup So, et. al.Yoon‐Sup So ... Jode Edwards
01 Sep 2009
Crop Science | VOL. 49

Variance-Component Based Sparse Signal Reconstruction and Model Selection
Kun Qiu ... Aleksandar Dogandzic
IEEE Transactions on Signal Processing | VOL. 58
Kun Qiu, et. al.Kun Qiu ... Aleksandar Dogandzic
01 Jun 2010
IEEE Transactions on Signal Processing | VOL. 58

Model selection and multimodel inference for standardizing catch rates of bycatch species: a case study of oceanic whitetip shark in the Hawaii-based longline fishery
Jon Brodziak ... William A Walsh
Canadian Journal of Fisheries and Aquatic Sciences | VOL. 70
Jon Brodziak, et. al.Jon Brodziak ... William A Walsh
01 Dec 2013
Canadian Journal of Fisheries and Aquatic Sciences | VOL. 70

On oracle inequalities related to data-driven hard thresholding
Golubev Yuri
Probability Theory and Related Fields | VOL. 150
Golubev YuriGolubev Yuri
16 Mar 2010
Probability Theory and Related Fields | VOL. 150

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse model selection under heterogeneous noise: Exact penalisation and data-driven thresholding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics