Adaptive estimation in multivariate response regression with hidden variables

Xin Bing,Yaosheng Xu,Yang Ning

doi:10.1214/21-aos2059

Abstract

A prominent concern of scientific investigators is the presence of unobserved hidden variables in association analysis. Ignoring hidden variables often yields biased statistical results and misleading scientific conclusions. Motivated by this practical issue, this paper studies the multivariate response regression with hidden variables, Y=(Ψ∗)TX+(B∗)TZ+E, where Y∈Rm is the response vector, X∈Rp is the observable feature, Z∈RK represents the vector of unobserved hidden variables, possibly correlated with X, and E is an independent error. The number of hidden variables K is unknown and both m and p are allowed, but not required, to grow with the sample size n. Though Ψ∗ is shown to be nonidentifiable due to the presence of hidden variables, we propose to identify the projection of Ψ∗ onto the orthogonal complement of the row space of B∗, denoted by Θ∗. The quantity (Θ∗)TX measures the effect of X on Y that cannot be explained through the hidden variables, and thus Θ∗ is treated as the parameter of interest. Motivated by the identifiability proof, we propose a novel estimation algorithm for Θ∗, called HIVE, under homoscedastic errors. The first step of the algorithm estimates the best linear prediction of Y given X, in which the unknown coefficient matrix exhibits an additive decomposition of Ψ∗ and a dense matrix due to the correlation between X and Z. Under the sparsity assumption on Ψ∗, we propose to minimize a penalized least squares loss by regularizing Ψ∗ and the dense matrix via group-lasso and multivariate ridge, respectively. Nonasymptotic deviation bounds of the in-sample prediction error are established. Our second step estimates the row space of B∗ by leveraging the covariance structure of the residual vector from the first step. In the last step, we estimate Θ∗ via projecting Y onto the orthogonal complement of the estimated row space of B∗ to remove the effect of hidden variables. Nonasymptotic error bounds of our final estimator of Θ∗, which are valid for any m,p,K and n, are established. We further show that, under mild assumptions, the rate of our estimator matches the best possible rate with known B∗ and is adaptive to the unknown sparsity of Θ∗ induced by the sparsity of Ψ∗. The model identifiability, estimation algorithm and statistical guarantees are further extended to the setting with heteroscedastic errors. Thorough numerical simulations and two real data examples are provided to back up our theoretical results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive estimation in multivariate response regression with hidden variables

Abstract

Talk to us

Similar Papers

More From: The Annals of Statistics

Lead the way for us

Journal: The Annals of Statistics	Publication Date: Apr 1, 2022
Citations: 4

Similar Papers

Mixed-type multivariate response regression with covariance estimation.
Karl Oskar Ekvall ... Aaron J Molstad
Statistics in medicine | VOL. 41
Karl Oskar Ekvall, et. al.Karl Oskar Ekvall ... Aaron J Molstad
24 Mar 2022
Statistics in medicine | VOL. 41

Bias-corrected heterosced asticity robust covariance matrix (sandwich) estimators
Qian Lianfen ... Wang Suojin
Journal of Statistical Computation and Simulation | VOL. 70
Qian Lianfen, et. al.Qian Lianfen ... Wang Suojin
01 Sep 2001
Journal of Statistical Computation and Simulation | VOL. 70

Moment-based dimension reduction for multivariate response regression
Xiangrong Yin ... Efstathia Bura
Journal of Statistical Planning and Inference | VOL. 136
Xiangrong Yin, et. al.Xiangrong Yin ... Efstathia Bura
25 Mar 2005
Journal of Statistical Planning and Inference | VOL. 136

The adaptive lasso in high-dimensional sparse heteroscedastic models
J Wagener ... H Dette
Mathematical Methods of Statistics | VOL. 22
J Wagener, et. al.J Wagener ... H Dette
01 Apr 2013
Mathematical Methods of Statistics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive estimation in multivariate response regression with hidden variables

Abstract

Talk to us

Similar Papers

More From: The Annals of Statistics