Abstract
Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWASs). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. I describe a statistical model that uses association statistics computed across the genome to identify classes of genomic elements that are enriched with or depleted of loci influencing a trait. The model naturally incorporates multiple types of annotations. I applied the model to GWASs of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, body mass index, and Crohn disease. For each trait, I used the model to evaluate the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over 100 tissues and cell lines. The fraction of phenotype-associated SNPs influencing protein sequence ranged from around 2% (for platelet volume) up to around 20% (for low-density lipoprotein cholesterol), repressed chromatin was significantly depleted for SNPs associated with several traits, and cell-type-specific DNase-I hypersensitive sites were enriched with SNPs associated with several traits (for example, the spleen in platelet volume). Finally, reweighting each GWAS by using information from functional genomics increased the number of loci with high-confidence associations by around 5%.
Highlights
A fundamental goal of human genetics is to create a catalog of the genetic polymorphisms that cause phenotypic variation in our species and to characterize the precise molecular mechanisms by which these polymorphisms exert their effects
I included as annotations the output from ‘‘genome segmentation’’ of the six main ENCODE cell lines;[45] for each section of the genome in each cell line, Hoffman et al.[45] report whether the histone modifications in the region are consistent with enhancer activity, transcription start site (TSS), promoter-flanking regions, CTCF binding sites, or repressed chromatin
I have developed a statistical model for identifying genomic annotations that are most relevant to the biology of a given phenotype
Summary
A fundamental goal of human genetics is to create a catalog of the genetic polymorphisms that cause phenotypic variation in our species and to characterize the precise molecular mechanisms by which these polymorphisms exert their effects. The ENCODE project has generated detailed maps of histone modifications and transcription factor binding in six human cell lines, partly to interpret GWAS signals that might act via a mechanism of gene regulation.[3] Methods for combining potentially rich sources of functional genomic data with GWASs could in principle lead to important biological insights. The development of such a method is the aim of this paper
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.