Abstract
We uncovered the diversity of non-canonical splice sites at the human transcriptome using deep transcriptome profiling. We mapped a total of 3.7 billion human RNA-seq reads and developed a set of stringent filters to avoid false non-canonical splice site detections. We identified 184 splice sites with non-canonical dinucleotides and U2/U12-like consensus sequences. We selected 10 of the herein identified U2/U12-like non-canonical splice site events and successfully validated 9 of them via reverse transcriptase-polymerase chain reaction and Sanger sequencing. Analyses of the 184 U2/U12-like non-canonical splice sites indicate that 51% of them are not annotated in GENCODE. In addition, 28% of them are conserved in mouse and 76% are involved in alternative splicing events, some of them with tissue-specific alternative splicing patterns. Interestingly, our analysis identified some U2/U12-like non-canonical splice sites that are converted into canonical splice sites by RNA A-to-I editing. Moreover, the U2/U12-like non-canonical splice sites have a differential distribution of splicing regulatory sequences, which may contribute to their recognition and regulation. Our analysis provides a high-confidence group of U2/U12-like non-canonical splice sites, which exhibit distinctive features among the total human splice sites.
Highlights
Most genes in higher eukaryotes are interrupted by noncoding sequences, called introns, which are precisely excised from pre-mRNAs during splicing
To get a comprehensive landscape of non-canonical splice sites at the different human tissues, we analyzed the alignments of 1.2 billion RNA-seq reads from a mixture of 16 human tissues (Illumina Body Map 2.0) to the human reference genome and cDNA/expressed sequence tag (EST) alignments, but we filter non-canonical splice sites that have single nucleotide polymorphisms (SNPs) or indels reported in SNPdb135 [34] (Figure 1A)
We have done a comprehensive analysis of human noncanonical splice sites based on deep transcriptome sequencing data generated by RNA-seq
Summary
Most genes in higher eukaryotes are interrupted by noncoding sequences, called introns, which are precisely excised from pre-mRNAs during splicing. Proper intron recognition and removal rely on consensus sequences located at the intron/exon boundaries. Dinucleotide sequences at these boundaries have been found to be strongly conserved and relevant for proper splicing [3,4,5]. Most introns belong to the so-called U2-type, which are spliced by the major spliceosome and are flanked by GT– AG splice site dinucleotides. About 0.4% of the human splice sites belong to the U12type These introns are processed by the minor spliceosome and even though they were first described to have AT–AC dinucleotides at the intron/exon boundaries, the vast majority of them contain GT–AG sites [7]. The AT–AC sites comprise only ∼0.09% of the splice sites [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.