Abstract

Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for Adaptive Low-rank Principal Heterogeneity Adjustment) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the batch effects and to enhance the inferential power by aggregating the homogeneous residuals from multiple sources. Under a pervasive assumption that the latent heterogeneity factors simultaneously affect a fraction of observed variables, we provide a rigorous theory to justify the proposed framework. Our framework also allows the incorporation of informative covariates and appeals to the 'Bless of Dimensionality'. As an illustrative application of this generic framework, we consider a problem of estimating high-dimensional precision matrix for graphical model inference based on multiple datasets. We also provide thorough numerical studies on both synthetic datasets and a brain imaging dataset to demonstrate the efficacy of the developed theory and methods.

Highlights

  • Aggregating and analyzing heterogeneous data is one of the most fundamental challenges in scientific data analysis

  • We model the heterogeneity by a semiparametric factor model

  • We introduce the ALPHA framework for heterogeneity adjustment

Read more

Summary

Introduction

Aggregating and analyzing heterogeneous data is one of the most fundamental challenges in scientific data analysis. To properly analyze data aggregated from multiple sources, we need to carefully model and adjust the heterogeneity effect. There is still a gap that exists between practice and theories To bridge this gap, we propose a generic theoretical framework to model, estimate, and adjust heterogeneity across multiple datasets. We denote Ui = Xit − ΛiFi to be the heterogeneity adjusted signal, which can be treated as homogeneous across different datasets and can be combined together for downstream statistical analysis. The idea of covariate-adjusted precision matrix estimation has been studied by Cai et al (2012), but the factor model they used assumes observed factors and no heterogeneity issue, i.e., m = 1.

Problem Setup
Semiparametric factor model
Modeling assumptions and general methodology
Regime 1
Regime 2
The ALPHA Framework
Estimating factors by PCA
Estimating factors by Projected-PCA
Specification test
Estimating number of factors
Summary of ALPHA
Conditional Graphical Model
Covariance estimation
Precision matrix estimation
Numerical Studies
Preliminary analysis
Synthetic datasets
Model calibration and data generation
Estimation of Σ
Estimation of Ω
Brain image network data
Discussions
A Algorithm for ALPHA
Convergence of factors F
Findings
F Technical lemmas
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.