Abstract

BackgroundRNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves.MethodsWe downloaded the raw reads for all available human RNA-seq datasets. Using these reads we performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. We also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed us to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. Our results were validated using samples for which DNA-seq genotypes were available.Results4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though these data originated from many different laboratories, samples reflecting the same cell type clustered together, suggesting that technical biases due to different sequencing protocols are limited. In a joint analysis on the 1,262 samples with high quality genotypes, we identified cis-eQTLs effects for 8,034 unique genes (at a false discovery rate ≤0.05). eQTL mapping on individual tissues revealed that a limited number of samples already suffice to identify tissue-specific eQTLs for known disease-associated genetic variants. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels.ConclusionsBy deriving and imputing genotypes from RNA-seq data, it is possible to identify both eQTLs and ASE effects. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become especially relevant for studying the effects of tissue-specific and rare pathogenic genetic variants to aid clinical interpretation of exome and genome sequencing.Electronic supplementary materialThe online version of this article (doi:10.1186/s13073-015-0152-4) contains supplementary material, which is available to authorized users.

Highlights

  • RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus mapping or through allele-specific expression (ASE) analysis

  • Other components permit accurate identification of many tissue types such as brain (Figure 2b, components 4 and 10), liver (Figure 2c, components 14 and 11) and bladder (Figure 2d, components 4 and 38), even though the RNA-seq data for these tissues had been generated in at least six different laboratories, with often quite pronounced technical differences. These results indicate that heterogeneous RNA-seq datasets that have been aligned, normalized and quality controlled in a systematic manner yield gene-expression profiles that very clearly describe biologically coherent phenomena

  • These results indicate that researchers who would like to learn more about one specific tissue could combine different RNA-seq data for that tissue into one large dataset

Read more

Summary

Introduction

RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. With the availability of RNA-sequencing (RNA-seq) two strategies are commonly used to identify these effects: (1) expression quantitative trait loci (eQTL) mapping to Deelen et al Genome Medicine (2015) 7:30 in the different genotype classes. EQTL data on many tissues and many different samples should be available since this would permit eQTL mapping and ASE analyses on rare and low-frequency variants within different cell types. This is especially important for the functional interpretation of clinically important rare variants ( recessive Mendelian mutations, where the mutant alleles have appreciable frequencies in the general population [18]), but will aid in the classification of variants of unknown significance [19]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call