Abstract
Mapping molecular QTLs has emerged as an important tool for understanding the genetic basis of cell functions. With the increasing availability of functional genomic data, it is natural to incorporate genomic annotations into QTL discovery. Discovering molecular QTLs is typically framed as a multiple hypothesis testing problem and solved using false discovery rate (FDR) control procedures. Currently, most existing statistical approaches rely on obtaining $p$-values for each candidate locus through permutation-based schemes, which are not only inconvenient for incorporating highly informative genomic annotations but also computationally inefficient. In this paper, we discuss a novel statistical approach for integrative QTL discovery based on the theoretical framework of Bayesian FDR control. We use a Bayesian hierarchical model to naturally integrate genomic annotations into molecular QTL mapping and propose an empirical Bayes-based computational procedure to approximate the necessary posterior probabilities to achieve high computational efficiency. Through theoretical arguments and simulation studies, we demonstrate that the proposed approach rigorously controls the desired type I error rate and greatly improves the power of QTL discovery when incorporating informative annotations. Finally, we demonstrate our approach by analyzing the expression-genotype data from 44 human tissues generated by the GTEx project. By integrating the simple annotation of SNP distance to transcription start sites, we discover more genes that harbor expression-associated SNPs in all 44 tissues, with an average increase of 1485 genes per tissue.
Highlights
With the advancements in sequencing technology, mapping quantitative trait loci (QTLs) with cellular phenotypes has emerged as a powerful tool for understanding the genetic basis of cell functions
We have introduced a powerful statistical approach for discovering molecular QTLs using high-throughput sequencing data and dense genotype data
Through a combination of theoretical derivations, simulation studies and real applications, we have demonstrated that (i) our proposed novel approach rigorously controls predefined false discovery rates in QTL discovery; (ii) by naturally integrating highly informative genomic annotation, the proposed approach consistently exhibits superior power compared with the current gold-standard approaches; and (iii) our implementation of the proposed statistical methods exhibits superb computational efficiency and is several hundreds times faster than the standard approach by avoiding extensive permutations
Summary
With the advancements in sequencing technology, mapping quantitative trait loci (QTLs) with cellular phenotypes has emerged as a powerful tool for understanding the genetic basis of cell functions. Recent QTL mapping studies using RNA-seq, ChIP-seq, DNaseI-seq, ATAC-seq and DNA methylation data have revealed that an abundance of genetic variants are associated with various cellular phenotypes [Ardlie et al (2015), Banovich et al (2014), Degner et al (2012), Ding et al (2014), McVicker et al (2013)]. The discovery of molecular QTLs has provided valuable insights for understanding the molecular mechanisms of complex diseases, as demonstrated by Neto et al (2013). Molecular QTL, genomic annotations, Bayesian FDR control, QTL mapping
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.