Abstract
Suppose that we observe $y\in\mathbb{R}^{n}$ and $X\in\mathbb{R}^{n\times m}$ in the following errors-in-variables model: \begin{eqnarray*}y&=&X_{0}\beta^{*}+\epsilon\\X&=&X_{0}+W\end{eqnarray*} where $X_{0}$ is an $n\times m$ design matrix with independent subgaussian row vectors, $\epsilon\in\mathbb{R}^{n}$ is a noise vector and $W$ is a mean zero $n\times m$ random noise matrix with independent subgaussian column vectors, independent of $X_{0}$ and $\epsilon$. This model is significantly different from those analyzed in the literature in the sense that we allow the measurement error for each covariate to be a dependent vector across its $n$ observations. Such error structures appear in the science literature when modeling the trial-to-trial fluctuations in response strength shared across a set of neurons. Under sparsity and restrictive eigenvalue type of conditions, we show that one is able to recover a sparse vector $\beta^{*}\in\mathbb{R}^{m}$ from the model given a single observation matrix $X$ and the response vector $y$. We establish consistency in estimating $\beta^{*}$ and obtain the rates of convergence in the $\ell_{q}$ norm, where $q=1,2$ for the Lasso-type estimator, and for $q\in [1,2]$ for a Dantzig-type Conic programming estimator. We show error bounds which approach that of the regular Lasso and the Dantzig selector in case the errors in $W$ are tending to 0. We analyze the convergence rates of the gradient descent methods for solving the nonconvex programs and show that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to a neighborhood of the global minimizers: the size of the neighborhood is bounded by the statistical error in the $\ell_{2}$ norm. Our analysis reveals interesting connections between computational and statistical efficiency and the concentration of measure phenomenon in random matrix theory. We provide simulation evidence illuminating the theoretical predictions.
Highlights
IntroductionThe matrix variate normal model has a long history in psychology and social sciences
The matrix variate normal model has a long history in psychology and social sciences.In recent years, it is becoming increasingly popular in biology and genomics, neuroscience, econometric theory, image and signal processing, wireless communication, and machine learning; see for example [15, 22, 17, 52, 5, 54, 18, 2, 26] and references therein
To bound the optimization errors, we show that the corrected linear regression loss function (1.9) satisfies the following Restricted Strong Convexity (RSC) and Restricted Smoothness (RSM) conditions when the sample size and effective rank of matrix B satisfy certain lower bounds
Summary
The matrix variate normal model has a long history in psychology and social sciences. In recent years, it is becoming increasingly popular in biology and genomics, neuroscience, econometric theory, image and signal processing, wireless communication, and machine learning; see for example [15, 22, 17, 52, 5, 54, 18, 2, 26] and references therein. We say that an n × m random matrix X follows a matrix normal distribution with a separable covariance matrix ΣX = A ⊗ B and mean M ∈ Rn×m, which we write Xn×m ∼ Nn,m(M, Am×m ⊗ Bn×n). See [15, 22] for more characterization and examples
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.