Abstract

Many disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. Previous studies have found enrichment of expression quantitative trait loci (eQTLs) in disease risk loci, indicating that identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allelic imbalance (AIM) that measures imbalance in gene expression on two chromosomes of a diploid organism. In this work, we develop a novel statistical method that leverages both AIM and total expression data to detect causal variants that regulate gene expression. We illustrate through simulations and application to 10 tissues of the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. Across all tissues and genes, our method achieves a median reduction rate of 11% in the number of putative causal variants. We use chromatin state data from the Roadmap Epigenomics Consortium to show that the putative causal variants identified by our method are enriched for active regions of the genome, providing orthogonal support that our method identifies causal variants with increased specificity.

Highlights

  • Understanding the regulation of gene expression by genetic variants is essential for identifying the biological mechanisms of gene expression and complex traits

  • Many studies have identified genetic variants that are associated with the expression of genes

  • If a region of the genome contains many genetic variants that are highly correlated with each other, non-causal genetic variants close to a causal variant are correlated with gene expression

Read more

Summary

Introduction

Understanding the regulation of gene expression by genetic variants is essential for identifying the biological mechanisms of gene expression and complex traits. A few fine-mapping approaches have been developed to address these challenges [16,17,18,19,20,21,22] These methods attempt to calculate a posterior probability for each variant and select a set of variants called a “causal set” that contains all causal variants with high probability (e.g., 95%) while minimizing the number of variants in the causal set. These previous methods are accurate in including all true causal variants in the causal set, they may include many non-causal variants in regions with high LD, which increases the number of variants that need to be validated in downstream studies

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call