Abstract

Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

Highlights

  • Deeper characterization of genetic variation is becoming increasingly available with advances in DNA sequencing technology [1,2,3,4,5]

  • We investigate the landscape of regulatory variation as surveyed by population-scale sequencing by using data acquired from the 1000 genomes project, together with gene expression data in 60 CEU individuals (CEU: Utah residents with ancestry from northern and western Europe) acquired using RNA sequencing (RNA-Seq) and 57 CEU and 56 YRI individuals (YRI: Yoruba in Ibadan, Nigeria) acquired using gene expression arrays [15,16]

  • We showed that regulatory variation can putatively modify the effects of a large proportion of nonsynonymous coding variants, and present population genetic evidence suggestive of such interactions

Read more

Summary

Introduction

Deeper characterization of genetic variation is becoming increasingly available with advances in DNA sequencing technology [1,2,3,4,5] This improves our ability to pinpoint protein-coding variants which disrupt protein structure, and has already begun to provide insight into the genetic basis of disease with unknown etiology [6,7]. Relative to protein coding variation, the information about the structure of gene regulatory architecture is incomplete and the existence of a regulatory variant is largely inferred through its association with gene expression. Such associations have previously been identified as exhibiting widespread and tissue-specific patterns [8,9,10,11]. We demonstrate the value of almost complete information from the 1000 genomes project to reveal the fine structure of rare and common regulatory variation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call