Abstract
Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.
Highlights
Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNAsequencing of human individuals at the population-scale, making genome-wide investigations of the interindividual genetic impact on gene expression viable
In order to investigate the molecular mechanisms that cause splicing variation between populations, we focused on variants that directly affect the affinity of annotated splice sites, considering an informative sequence of 9nt for splice donors including the GT dinucleotide, and 27nt for splice acceptors that include the AG dinucleotide and the typical area of the preceding polypyrimidine tract
The frequency of single nucleotide polymorphisms (SNPs) occurring at certain positions of the splice site sequence is negatively correlated with the information content of the consensus motif, and the dinucleotides involved in the splicing reaction are mostly exempt of sequence polymorphisms (Fig. 1a,b)
Summary
Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNAsequencing of human individuals at the population-scale, making genome-wide investigations of the interindividual genetic impact on gene expression viable. At the molecular level these reactions rely on the recognition of the corresponding core RNA elements by different factors involved in transcript processing, i.e., components of the splicing machinery (e.g., U1 and U2) that target the splice site sequences in order to remove introns[2] and polyadenylation signals that correspondingly bind to the Cleavage/Polyadenylation Specificity Factor (CPSF) for initiating the 3′formation[3,4]. The advent of high-throughput sequencing technologies heralded a new generation of population-scale projects that analyse combined DNA and RNA sequencing across multiple individuals Such studies generally focus on identifying which genetic elements are statistically associated with a certain phenotype—usually defined as a quantitative trait locus (QTL) resolved at gene- transcript- or exon-level—rather than building hypotheses about how these phenotypic changes are mechanistically projected from the DNA to the RNA molecules[24,25,26,27,28,29,30,31]. Our studies describe a comprehensive classification and comparison of the different ways in which RNA processing can be affected by these sources of sequence variation and serve as a reference for forthcoming mechanistic studies on RNA regulation by minority alleles
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.