Heterogeneity adjustment with applications to graphical model inference.

Jianqing Fan,Han Liu,Weichen Wang,Ziwei Zhu

doi:10.1214/18-ejs1466

Abstract

Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for Adaptive Low-rank Principal Heterogeneity Adjustment) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the batch effects and to enhance the inferential power by aggregating the homogeneous residuals from multiple sources. Under a pervasive assumption that the latent heterogeneity factors simultaneously affect a fraction of observed variables, we provide a rigorous theory to justify the proposed framework. Our framework also allows the incorporation of informative covariates and appeals to the 'Bless of Dimensionality'. As an illustrative application of this generic framework, we consider a problem of estimating high-dimensional precision matrix for graphical model inference based on multiple datasets. We also provide thorough numerical studies on both synthetic datasets and a brain imaging dataset to demonstrate the efficacy of the developed theory and methods.

Highlights

Aggregating and analyzing heterogeneous data is one of the most fundamental challenges in scientific data analysis
We model the heterogeneity by a semiparametric factor model
We introduce the ALPHA framework for heterogeneity adjustment

Summary

Introduction

Aggregating and analyzing heterogeneous data is one of the most fundamental challenges in scientific data analysis. To properly analyze data aggregated from multiple sources, we need to carefully model and adjust the heterogeneity effect. There is still a gap that exists between practice and theories To bridge this gap, we propose a generic theoretical framework to model, estimate, and adjust heterogeneity across multiple datasets. We denote Ui = Xit − ΛiFi to be the heterogeneity adjusted signal, which can be treated as homogeneous across different datasets and can be combined together for downstream statistical analysis. The idea of covariate-adjusted precision matrix estimation has been studied by Cai et al (2012), but the factor model they used assumes observed factors and no heterogeneity issue, i.e., m = 1.

Problem Setup

Semiparametric factor model

Modeling assumptions and general methodology

Regime 1

Regime 2

The ALPHA Framework

Estimating factors by PCA

Estimating factors by Projected-PCA

Specification test

Estimating number of factors

Summary of ALPHA

Conditional Graphical Model

Covariance estimation

Precision matrix estimation

Numerical Studies

Preliminary analysis

Synthetic datasets

Model calibration and data generation

Estimation of Σ

Estimation of Ω

Brain image network data

Discussions

A Algorithm for ALPHA

Convergence of factors F

Findings

F Technical lemmas

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic journal of statistics	Publication Date: Jan 1, 2018
Citations: 7	License type: cc-by

R Discovery Prime

R Discovery Prime

Heterogeneity adjustment with applications to graphical model inference.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic journal of statistics

Lead the way for us

Similar Papers

Clustering on Multi-source Incomplete Data via Tensor Modeling and Factorization
Weixiang Shao ... Lifang He
-
Weixiang Shao, et. al.Weixiang Shao ... Lifang He
01 Jan 2015
01 Jan 2015

Variational BEJG Solvers for Marginal-MAP Inference with Accurate Approximation of B-Conditional Entropy
Igor Kiselev
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Igor KiselevIgor Kiselev
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Multi-Source Causal Analysis: Learning Bayesian Networks from Multiple Datasets
Ioannis Tsamardinos ... Asimakis P Mariglis
-
Ioannis Tsamardinos, et. al.Ioannis Tsamardinos ... Asimakis P Mariglis
01 Jan 2009
01 Jan 2009

Efficient algorithms for fast integration on large data sets from multiple sources
Tian Mi ... Sanguthevar Rajasekaran
BMC Medical Informatics and Decision Making | VOL. 12
Tian Mi, et. al.Tian Mi ... Sanguthevar Rajasekaran
28 Jun 2012
BMC Medical Informatics and Decision Making | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Heterogeneity adjustment with applications to graphical model inference.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic journal of statistics