Abstract

The growing volume and complexity of whole-genome sequence (WGS) and multi-omic data requires new analytic approaches beyond those developed for the GWAS era. In response to this challenge, we present an Analysis Commons, which brings together genotype and phenotype data from multiple studies along with a suite of powerful and validated analysis tools into a secure cloud-computing framework that is equitably accessible by associated investigators. This framework is designed to address the emerging challenges of multi-center WGS analyses—data sharing mechanisms, phenotype harmonization, -omics integration, annotation—and the need for flexible, secure, efficient, high-performance computing for numerous users. The Analysis Commons is built on the DNAnexus cloud platform, which provides large parallel compute resources and robust security protocols. To permit multi-center data sharing, we implemented two parallel data sharing approaches: (1) a multi-lateral consortium agreement that enables data sharing across multiple studies, and (2) coordinated dbGaP applications among groups of institutions. Investigators with detailed knowledge of the phenotypes and contributing studies harmonize data from multiple sources for maintenance in a central database. The Analysis Commons supports multiple association-analysis software packages, as well as tools for annotation and visualization. Importantly, approved investigators have full access to the combined data sets, facilitating the rapid development and deployment of new methods. We demonstrate the Analysis Commons model with an analysis of fibrinogen in 3999 participants from the Old Order Amish Study and the Framingham Heart Study with WGS from the Trans-Omics for Precision Medicine (TOPMed) Program. We performed and validated single-variant and SKAT analyses using GENESIS and MMAP pipelines, accounting for relatedness with linear mixed models. We confirmed a known association of a nonsynonymous variant in FGG (p=2.5e-9, MAF=0.34%, rs148685782) . No other single variant or SKAT association was significant after correcting for the number of tests. Analyses were run in parallel across 1408 cores and took less than one hour of wall-clock time. The Analysis Commons offers the necessary infrastructure support for analysis of WGS and multi-omic data in a setting that empowers phenotype, analytic, and computational experts to transform raw data into knowledge of the determinants of cardiovascular health.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call