Two-sample covariance inference in high-dimensional elliptical models
This paper introduces a two-sample covariance test for high-dimensional elliptical models using a U-statistic estimator of the squared Frobenius norm difference. The authors establish a new central limit theorem for elliptical data, enabling asymptotic control and power analysis, with the method supported by theoretical guarantees under mild assumptions without requiring sparsity or specific dimension-to-sample-size conditions.
We propose a two-sample test for large-dimensional covariance matrices in generalized elliptical models. The test statistic is based on a U-statistic estimator of the squared Frobenius norm of the difference between the two population covariance matrices. This statistic was originally introduced by Li and Chen (2012) for the independent component model. As a key theoretical contribution, we establish a new central limit theorem for the U-statistics under elliptical data, valid under both the null and alternative hypotheses. This result enables asymptotic control of the test level and facilitates a power analysis. To the best of our knowledge, the proposed test is the first such method to be supported by theoretical guarantees for elliptical data. Our approach imposes only mild assumptions on the covariance matrices and does not require sparsity nor explicit growth conditions on the dimension-to-sample-size ratio. We illustrate our theoretical findings through applications to both synthetic and real-world data.
- Research Article
18
- 10.1111/biom.13013
- Dec 14, 2018
- Biometrics
Drawing inferences for high-dimensional models is challenging as regular asymptotic theories are not applicable. This article proposes a new framework of simultaneous estimation and inferences for high-dimensional linear models. By smoothing over partial regression estimates based on a given variable selection scheme, we reduce the problem to low-dimensional least squares estimations. The procedure, termed as Selection-assisted Partial Regression and Smoothing (SPARES), utilizes data splitting along with variable selection and partial regression. We show that the SPARES estimator is asymptotically unbiased and normal, and derive its variance via a nonparametric delta method. The utility of the procedure is evaluated under various simulation scenarios and via comparisons with the de-biased LASSO estimators, a major competitor. We apply the method to analyze two genomic datasets and obtain biologically meaningful results.
- Research Article
- 10.1080/07350015.2023.2191672
- Apr 14, 2023
- Journal of Business & Economic Statistics
This article proposes a general two-directional simultaneous inference (TOSI) framework for high-dimensional models with a manifest variable or latent variable structure, for example, high-dimensional mean models, high-dimensional sparse regression models, and high-dimensional latent factors models. TOSI performs simultaneous inference on a set of parameters from two directions, one to test whether the assumed zero parameters indeed are zeros and one to test whether exist zeros in the parameter set of nonzeros. As a result, we can better identify whether the parameters are zeros, thereby keeping the data structure fully and parsimoniously expressed. We theoretically prove that the single-split TOSI is asymptotically unbiased and the multi-split version of TOSI can control the Type I error below the prespecified significance level. Simulations are conducted to examine the performance of the proposed method in finite sample situations and two real datasets are analyzed. The results show that the TOSI method can provide more predictive and more interpretable estimators than existing methods.
- Research Article
4
- 10.1016/j.jeconom.2023.105650
- Jan 9, 2024
- Journal of Econometrics
Reprint: Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic
- Research Article
21
- 10.1016/j.jeconom.2022.03.001
- Apr 8, 2022
- Journal of Econometrics
Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic
- Single Report
89
- 10.1920/wp.cem.2018.3518
- Jun 12, 2018
This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means. Within this context, we review fundamental results including high-dimensional central limit theorems, bootstrap approximation of high-dimensional limit distributions, and moderate deviation theory. We also review key concepts underlying inference when many parameters are of interest such as multiple testing with family-wise error rate or false discovery rate control. We then turn to a general high-dimensional minimum distance framework with a special focus on generalized method of moments problems where we present results for estimation and inference about model parameters. The presented results cover a wide array of econometric applications, and we discuss several leading special cases including high-dimensional linear regression and linear instrumental variables models to illustrate the general results.
- Research Article
3
- 10.1145/3412815.3416883
- Oct 18, 2020
- FODS '20 : proceedings of the 2020 ACM-IMS Foundations of Data Science Conference : October 19-20, 2020, Virtual Event, USA. ACM-IMS Foundations of Data Science Conference (2020 : Online)
This paper concerns the development of an inferential framework for high-dimensional linear mixed effect models. These are suitable models, for instance, when we have n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (and may be larger than M), but the number of random effects q is small. Our framework is inspired by a recent line of work that proposes de-biasing penalized estimators to perform inference for high-dimensional linear models with fixed effects only. In particular, we demonstrate how to correct a 'naive' ridge estimator in extension of work by Bühlmann (2013) to build asymptotically valid confidence intervals for mixed effect models. We validate our theoretical results with numerical experiments, in which we show our method outperforms those that fail to account for correlation induced by the random effects. For a practical demonstration we consider a riboflavin production dataset that exhibits group structure, and show that conclusions drawn using our method are consistent with those obtained on a similar dataset without group structure.
- Research Article
13
- 10.3390/math10030463
- Jan 31, 2022
- Mathematics
In high-dimensional regression models, the Bayesian lasso with the Gaussian spike and slab priors is widely adopted to select variables and estimate unknown parameters. However, it involves large matrix computations in a standard Gibbs sampler. To solve this issue, the Skinny Gibbs sampler is employed to draw observations required for Bayesian variable selection. However, when the sample size is much smaller than the number of variables, the computation is rather time-consuming. As an alternative to the Skinny Gibbs sampler, we develop a variational Bayesian approach to simultaneously select variables and estimate parameters in high-dimensional linear mixed models under the Gaussian spike and slab priors of population-specific fixed-effects regression coefficients, which are reformulated as a mixture of a normal distribution and an exponential distribution. The coordinate ascent algorithm, which can be implemented efficiently, is proposed to optimize the evidence lower bound. The Bayes factor, which can be computed with the path sampling technique, is presented to compare two competing models in the variational Bayesian framework. Simulation studies are conducted to assess the performance of the proposed variational Bayesian method. An empirical example is analyzed by the proposed methodologies.
- Single Report
2
- 10.1920/wp.cem.2019.2919
- Jun 12, 2019
Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters d being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. To construct simultaneous confidence regions on many target parameters, sufficiently fast estimation rates of the nuisance functions are crucial. In this context, we establish uniform estimation rates and sparsity guarantees of the square-root estimator in a random design under approximate sparsity conditions that might be of independent interest for related problems in high-dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties.
- Research Article
16
- 10.1214/22-ba1332
- Sep 1, 2023
- Bayesian Analysis
We consider a Gaussian variational approximation of the posterior density in high-dimensional state space models. The number of parameters in the covariance matrix of the variational approximation grows as the square of the number of model parameters, so it is necessary to find simple yet effective parametrisations of the covariance structure when the number of model parameters is large. We approximate the joint posterior density of the state vectors by a dynamic factor model, having Markovian time dependence and a factor covariance structure for the states. This gives a reduced description of the dependence structure for the states, as well as a temporal conditional independence structure similar to that in the true posterior. We illustrate the methodology on two examples. The first is a spatio-temporal model for the spread of the Eurasian collared-dove across North America. Our approach compares favorably to a recently proposed ensemble Kalman filter method for approximate inference in high-dimensional hierarchical spatio-temporal models. Our second example is a Wishart-based multivariate stochastic volatility model for financial returns, which is outside the class of models the ensemble Kalman filter method can handle.
- Research Article
9
- 10.2139/ssrn.3313987
- Jun 18, 2019
- SSRN Electronic Journal
Non-Parametric Inference Adaptive to Intrinsic Dimension
- Single Report
78
- 10.1920/wp.cem.2011.4111
- Dec 30, 2011
This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on l1 -penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression.
- Research Article
4
- 10.2139/ssrn.3376794
- Jan 1, 2018
- SSRN Electronic Journal
Maximum Likelihood Estimation and Inference for High Dimensional Nonlinear Factor Models with Application to Factor-Augmented Regressions
- Single Report
11
- 10.1920/wp.cem.2014.5014
- Dec 31, 2014
We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high dimensional setting. The setting allows the number of time varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time varying variables in an unspecified way and allows that this heterogeneity may be non-zero for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variables models with fixed effects and many instruments. An input to developing the properties of our proposed procedures is the use of a variant of the Lasso estimator that allows for a grouped data structure where data across groups are independent and dependence within groups is unrestricted. We provide formal conditions within this structure under which the proposed Lasso variant selects a sparse model with good approximation properties. We present simulation results in support of the theoretical developments and illustrate the use of the methods in an application aimed at estimating the effect of gun prevalence on crime rates.
- Research Article
43
- 10.2139/ssrn.1694387
- Jan 1, 2010
- SSRN Electronic Journal
A Structural Model of Segregation in Social Networks
- Research Article
36
- 10.2139/ssrn.2294957
- Nov 5, 2010
- SSRN Electronic Journal
A Structural Model of Segregation in Social Networks