Abstract

We uncovered the diversity of non-canonical splice sites at the human transcriptome using deep transcriptome profiling. We mapped a total of 3.7 billion human RNA-seq reads and developed a set of stringent filters to avoid false non-canonical splice site detections. We identified 184 splice sites with non-canonical dinucleotides and U2/U12-like consensus sequences. We selected 10 of the herein identified U2/U12-like non-canonical splice site events and successfully validated 9 of them via reverse transcriptase-polymerase chain reaction and Sanger sequencing. Analyses of the 184 U2/U12-like non-canonical splice sites indicate that 51% of them are not annotated in GENCODE. In addition, 28% of them are conserved in mouse and 76% are involved in alternative splicing events, some of them with tissue-specific alternative splicing patterns. Interestingly, our analysis identified some U2/U12-like non-canonical splice sites that are converted into canonical splice sites by RNA A-to-I editing. Moreover, the U2/U12-like non-canonical splice sites have a differential distribution of splicing regulatory sequences, which may contribute to their recognition and regulation. Our analysis provides a high-confidence group of U2/U12-like non-canonical splice sites, which exhibit distinctive features among the total human splice sites.

Highlights

  • Most genes in higher eukaryotes are interrupted by noncoding sequences, called introns, which are precisely excised from pre-mRNAs during splicing

  • To get a comprehensive landscape of non-canonical splice sites at the different human tissues, we analyzed the alignments of 1.2 billion RNA-seq reads from a mixture of 16 human tissues (Illumina Body Map 2.0) to the human reference genome and cDNA/expressed sequence tag (EST) alignments, but we filter non-canonical splice sites that have single nucleotide polymorphisms (SNPs) or indels reported in SNPdb135 [34] (Figure 1A)

  • We have done a comprehensive analysis of human noncanonical splice sites based on deep transcriptome sequencing data generated by RNA-seq

Read more

Summary

Introduction

Most genes in higher eukaryotes are interrupted by noncoding sequences, called introns, which are precisely excised from pre-mRNAs during splicing. Proper intron recognition and removal rely on consensus sequences located at the intron/exon boundaries. Dinucleotide sequences at these boundaries have been found to be strongly conserved and relevant for proper splicing [3,4,5]. Most introns belong to the so-called U2-type, which are spliced by the major spliceosome and are flanked by GT– AG splice site dinucleotides. About 0.4% of the human splice sites belong to the U12type These introns are processed by the minor spliceosome and even though they were first described to have AT–AC dinucleotides at the intron/exon boundaries, the vast majority of them contain GT–AG sites [7]. The AT–AC sites comprise only ∼0.09% of the splice sites [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call