Inference In High-dimensional Models Research Articles

Interpretability and stability are two important features that are desired in many contemporary big data applications arising in statistics, economics, and finance. While the former is enjoyed to some extent by many existing forecasting approaches, the latter in the sense of controlling the fraction of wrongly discovered features which can enhance greatly the interpretability is still largely underdeveloped. To this end, in this article, we exploit the general framework of model-X knockoffs introduced recently in Candès, Fan, Janson and Lv [(2018 ), “Panning for Gold: ‘model X’ Knockoffs for High Dimensional Controlled Variable Selection,” Journal of the Royal Statistical Society, Series B, 80, 551–577], which is nonconventional for reproducible large-scale inference in that the framework is completely free of the use of p-values for significance testing, and suggest a new method of intertwined probabilistic factors decoupling (IPAD) for stable interpretable forecasting with knockoffs inference in high-dimensional models. The recipe of the method is constructing the knockoff variables by assuming a latent factor model that is exploited widely in economics and finance for the association structure of covariates. Our method and work are distinct from the existing literature in which we estimate the covariate distribution from data instead of assuming that it is known when constructing the knockoff variables, our procedure does not require any sample splitting, we provide theoretical justifications on the asymptotic false discovery rate control, and the theory for the power analysis is also established. Several simulation examples and the real data analysis further demonstrate that the newly suggested method has appealing finite-sample performance with desired interpretability and stability compared to some popularly used forecasting methods. Supplementary materials for this article are available online.

Read full abstract

Performing statistical inference in high-dimensional models is challenging because of the lack of precise information on the distribution of high-dimensional regularized estimators. Here, we consider linear regression in the high-dimensional regime $p>>n$ and the Lasso estimator: we would like to perform inference on the parameter vector $\theta^{*}\in\mathbb{R}^{p}$. Important progress has been achieved in computing confidence intervals and $p$-values for single coordinates $\theta^{*}_{i}$, $i\in\{1,\dots,p\}$. A key role in these new inferential methods is played by a certain debiased estimator $\widehat{\theta}^{\mathrm{d}}$. Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of $\widehat{\theta}^{\mathrm{d}}$ are asymptotically Gaussian provided the true parameters vector $\theta^{*}$ is $s_{0}$-sparse with $s_{0}=o(\sqrt{n}/\log p)$. The condition $s_{0}=o(\sqrt{n}/\log p)$ is considerably stronger than the one for consistent estimation, namely $s_{0}=o(n/\log p)$. In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_{0}=o(n/(\log p)^{2})$. The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well. For intermediate regimes, we describe the trade-off between sparsity in the coefficients $\theta^{*}$, and sparsity in the inverse covariance of the design. We further discuss several applications of our results beyond high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor $1+o_{n}(1)$ for i.i.d. Gaussian designs.

Read full abstract

Inference In High-dimensional Models Research Articles

Related Topics

Articles published on Inference In High-dimensional Models

IPAD: Stable Interpretable Forecasting with Knockoffs Inference

Debiasing the lasso: Optimal sample size for Gaussian designs

Efficient Covariance Approximations for Large Sparse Precision Matrices

IPAD: Stable Interpretable Forecasting with Knockoffs Inference

Likelihood-Free Inference in High-Dimensional Models.

A Structural Model of Segregation in Social Networks

A Structural Model of Segregation in Social Networks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Inference In High-dimensional Models Research Articles

Related Topics

Articles published on Inference In High-dimensional Models

IPAD: Stable Interpretable Forecasting with Knockoffs Inference

Debiasing the lasso: Optimal sample size for Gaussian designs

Efficient Covariance Approximations for Large Sparse Precision Matrices

IPAD: Stable Interpretable Forecasting with Knockoffs Inference

Likelihood-Free Inference in High-Dimensional Models.

A Structural Model of Segregation in Social Networks

A Structural Model of Segregation in Social Networks