Abstract

MotivationLD score regression is a reliable and efficient method of using genome-wide association study (GWAS) summary-level results data to estimate the SNP heritability of complex traits and diseases, partition this heritability into functional categories, and estimate the genetic correlation between different phenotypes. Because the method relies on summary level results data, LD score regression is computationally tractable even for very large sample sizes. However, publicly available GWAS summary-level data are typically stored in different databases and have different formats, making it difficult to apply LD score regression to estimate genetic correlations across many different traits simultaneously.ResultsIn this manuscript, we describe LD Hub - a centralized database of summary-level GWAS results for 173 diseases/traits from different publicly available resources/consortia and a web interface that automates the LD score regression analysis pipeline. To demonstrate functionality and validate our software, we replicated previously reported LD score regression analyses of 49 traits/diseases using LD Hub; and estimated SNP heritability and the genetic correlation across the different phenotypes. We also present new results obtained by uploading a recent atopic dermatitis GWAS meta-analysis to examine the genetic correlation between the condition and other potentially related traits. In response to the growing availability of publicly accessible GWAS summary-level results data, our database and the accompanying web interface will ensure maximal uptake of the LD score regression methodology, provide a useful database for the public dissemination of GWAS results, and provide a method for easily screening hundreds of traits for overlapping genetic aetiologies.Availability and ImplementationThe web interface and instructions for using LD Hub are available at http://ldsc.broadinstitute.org/Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • There is substantial empirical evidence demonstrating that the majority of complex traits and diseases in humans are influenced by hundreds if not thousands of genetic loci of small effect scattered across the genome as was first predicted a century ago (East, 1916; Fisher, 1918)

  • The advent of high throughput micro-array genotyping and generation sequencing technologies has meant that genome-wide data can be leveraged to ask fundamental questions concerning the underlying genetic architecture of common complex traits and diseases including the degree to which genetic variation affecting complex phenotypes is tagged by SNPs on genome-wide arrays (Lee et al, 2011; Yang et al, 2010, 2011), the degree to which this variation represents different functional categories and/or biological pathways (Finucane et al, 2015; Gusev et al, 2014), and the extent to which genetic aetiologies are shared across different phenotypes (Bulik-Sullivan et al, 2015b; Lee et al, 2012)

  • We describe LD Hub, a web-based utility that centralizes and harmonizes summary-level genome-wide association study (GWAS) results data, and automates LD Score regression analysis (Bulik-Sullivan et al, 2015a, b)

Read more

Summary

Introduction

There is substantial empirical evidence demonstrating that the majority of complex traits and diseases in humans are influenced by hundreds if not thousands of genetic loci of small effect scattered across the genome as was first predicted a century ago (East, 1916; Fisher, 1918). To date most of these types of analyses have been performed using genetic restricted maximum likelihood analysis (GREML) as implemented in software packages such as GCTA and LDAK (Lee et al, 2011; Speed et al, 2012; Yang et al, 2010, 2011). These methods require individual-level genotype data, which is often not available as most of the largest GWAS analyses are conducted through meta-analyses, and so typically only report summary results statistics (Zheng et al, 2013). GREML can be computationally prohibitive when analyzing raw genome-wide SNP data from hundreds of thousands of individuals. Most GREML analyses reported in the literature to date have been hypothesis driven studies that have involved only a small number of related traits (Table 1)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.