Integrative analysis of individual-level data and high-dimensional summary statistics.

Sheng Fu,Han Zhang,William Wheeler,Lu Deng,Jing Qin,Kai Yu

doi:10.1093/bioinformatics/btad156

Abstract

Researchers usually conduct statistical analyses based on models built on raw data collected from individual participants (individual-level data). There is a growing interest in enhancing inference efficiency by incorporating aggregated summary information from other sources, such as summary statistics on genetic markers' marginal associations with a given trait generated from genome-wide association studies. However, combining high-dimensional summary data with individual-level data using existing integrative procedures can be challenging due to various numeric issues in optimizing an objective function over a large number of unknown parameters. We develop a procedure to improve the fitting of a targeted statistical model by leveraging external summary data for more efficient statistical inference (both effect estimation and hypothesis testing). To make this procedure scalable to high-dimensional summary data, we propose a divide-and-conquer strategy by breaking the task into easier parallel jobs, each fitting the targeted model by integrating the individual-level data with a small proportion of summary data. We obtain the final estimates of model parameters by pooling results from multiple fitted models through the minimum distance estimation procedure. We improve the procedure for a general class of additive models commonly encountered in genetic studies. We further expand these two approaches to integrate individual-level and high-dimensional summary data from different study populations. We demonstrate the advantage of the proposed methods through simulations and an application to the study of the effect on pancreatic cancer risk by the polygenic risk score defined by BMI-associated genetic markers. R package is available at https://github.com/fushengstat/MetaGIM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics (Oxford, England)	Publication Date: Mar 25, 2023
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Integrative analysis of individual-level data and high-dimensional summary statistics.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Similar Papers

Mendelian Randomization Analysis of n-6 Polyunsaturated Fatty Acid Levels and Pancreatic Cancer Risk.
...
Cancer Epidemiology, Biomarkers & Prevention | VOL. 29
, et. al. ...
01 Dec 2020
Cancer Epidemiology, Biomarkers & Prevention | VOL. 29

Designs for the Combination of Group- and Individual-level Data
Sebastien Haneuse ... Scott Bartell
Epidemiology | VOL. 22
Sebastien Haneuse, et. al.Sebastien Haneuse ... Scott Bartell
01 May 2011
Epidemiology | VOL. 22

Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy.
Mingwei Dai ... Jin Liu
Bioinformatics | VOL. 35
Mingwei Dai, et. al.Mingwei Dai ... Jin Liu
11 Oct 2018
Bioinformatics | VOL. 35

IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.
Mingwei Dai ... Can Yang
Bioinformatics | VOL. 33
Mingwei Dai, et. al.Mingwei Dai ... Can Yang
11 May 2017
Bioinformatics | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrative analysis of individual-level data and high-dimensional summary statistics.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)