Abstract

While the promise of electronic medical record and biobank data is large, major questions remain about patient privacy, computational hurdles, and data access. One promising area of recent development is pre-computing non-individually identifiable summary statistics to be made publicly available for exploration and downstream analysis. In this manuscript we demonstrate how to utilize pre-computed linear association statistics between individual genetic variants and phenotypes to infer genetic relationships between products of phenotypes (e.g., ratios; logical combinations of binary phenotypes using “and” and “or”) with customized covariate choices. We propose a method to approximate covariate adjusted linear models for products and logical combinations of phenotypes using only pre-computed summary statistics. We evaluate our method’s accuracy through several simulation studies and an application modeling ratios of fatty acids using data from the Framingham Heart Study. These studies show consistent ability to recapitulate analysis results performed on individual level data including maintenance of the Type I error rate, power, and effect size estimates. An implementation of this proposed method is available in the publicly available R package pcsstools.

Highlights

  • Researchers have readily available access to massive quantities of genotypic and phenotypic data (Cox, 2018; Simell et al, 2019)

  • We modeled the phenotype product as a function of a single nucleotide polymorphism (SNP) and binary covariate

  • We modeled 12 fatty acid ratios using both individual participant data (IPD) and pre-computed summary statistics (PCSS) using data from the Framingham Heart Study’s Generation-3 and Offspring cohorts downloaded from dbGaP (Mailman et al, 2007)

Read more

Summary

Introduction

Researchers have readily available access to massive quantities of genotypic and phenotypic data (Cox, 2018; Simell et al, 2019). Via the Electronic Medical Records and Genomics {eMERGE Network, the UKBiobank (Bycroft et al, 2018) other initiatives and repositories [e.g., 23andMe, MGI2 (Gagliano Taliun et al, 2020), FINRISK, CHOP (Diogo et al, 2018), among others]}, researchers can access a wide variety of phenotypic and genomics data on hundreds of thousands of individuals. The size of biobank datasets makes it challenging to transfer, store, and analyze data locally. While cloud computing minimizes some of these issues, it brings its own challenges related to cost (storage and computation), transfer, and access.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.