Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.

Jack M Wolf,Jason Westra,Nathan Tintle

doi:10.3389/fgene.2021.745901

Jack M Wolf, Jason Westra + Show 1 more

Open Access

https://doi.org/10.3389/fgene.2021.745901

Copy DOI

Abstract

While the promise of electronic medical record and biobank data is large, major questions remain about patient privacy, computational hurdles, and data access. One promising area of recent development is pre-computing non-individually identifiable summary statistics to be made publicly available for exploration and downstream analysis. In this manuscript we demonstrate how to utilize pre-computed linear association statistics between individual genetic variants and phenotypes to infer genetic relationships between products of phenotypes (e.g., ratios; logical combinations of binary phenotypes using “and” and “or”) with customized covariate choices. We propose a method to approximate covariate adjusted linear models for products and logical combinations of phenotypes using only pre-computed summary statistics. We evaluate our method’s accuracy through several simulation studies and an application modeling ratios of fatty acids using data from the Framingham Heart Study. These studies show consistent ability to recapitulate analysis results performed on individual level data including maintenance of the Type I error rate, power, and effect size estimates. An implementation of this proposed method is available in the publicly available R package pcsstools.

Highlights

Researchers have readily available access to massive quantities of genotypic and phenotypic data (Cox, 2018; Simell et al, 2019)
We modeled the phenotype product as a function of a single nucleotide polymorphism (SNP) and binary covariate
We modeled 12 fatty acid ratios using both individual participant data (IPD) and pre-computed summary statistics (PCSS) using data from the Framingham Heart Study’s Generation-3 and Offspring cohorts downloaded from dbGaP (Mailman et al, 2007)

Summary

Introduction

Researchers have readily available access to massive quantities of genotypic and phenotypic data (Cox, 2018; Simell et al, 2019). Via the Electronic Medical Records and Genomics {eMERGE Network, the UKBiobank (Bycroft et al, 2018) other initiatives and repositories [e.g., 23andMe, MGI2 (Gagliano Taliun et al, 2020), FINRISK, CHOP (Diogo et al, 2018), among others]}, researchers can access a wide variety of phenotypic and genomics data on hundreds of thousands of individuals. The size of biobank datasets makes it challenging to transfer, store, and analyze data locally. While cloud computing minimizes some of these issues, it brings its own challenges related to cost (storage and computation), transfer, and access.

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Oct 12, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Illustrating the patient journey through the care continuum: Leveraging structured primary care electronic medical record (EMR) data in Ontario, Canada using chronic obstructive pulmonary disease as a case study
Jennifer Rayner ... Chen Wu
International Journal of Medical Informatics | VOL. 140
Jennifer Rayner, et. al.Jennifer Rayner ... Chen Wu
19 May 2020
International Journal of Medical Informatics | VOL. 140

Prospect of Artificial Intelligence Based on Electronic Medical Record.
Suehyun Lee ... Hun-Sung Kim
Journal of Lipid and Atherosclerosis | VOL. 10
Suehyun Lee, et. al.Suehyun Lee ... Hun-Sung Kim
01 Jan 2020
Journal of Lipid and Atherosclerosis | VOL. 10

Can Linked Electronic Medical Record and Administrative Data Help Us Identify Those Living with Frailty?
Sabrina Wong ... Alexander Singer
International journal of population data science | VOL. 5
Sabrina Wong, et. al.Sabrina Wong ... Alexander Singer
14 Oct 2020
International journal of population data science | VOL. 5

Prediction Accuracy With Electronic Medical Records Versus Administrative Claims.
Dan Zeltzer ... Liran Einav
Medical Care | VOL. 57
Dan Zeltzer, et. al.Dan Zeltzer ... Liran Einav
01 Jul 2019
Medical Care | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics