Abstract

BackgroundMost eukaryotic genes comprise exons and introns thus requiring the precise removal of introns from pre-mRNAs to enable protein biosynthesis. U2 and U12 spliceosomes catalyze this step by recognizing motifs on the transcript in order to remove the introns. A process which is dependent on precise definition of exon-intron borders by splice sites, which are consequently highly conserved across species. Only very few combinations of terminal dinucleotides are frequently observed at intron ends, dominated by the canonical GT-AG splice sites on the DNA level.ResultsHere we investigate the occurrence of diverse combinations of dinucleotides at predicted splice sites. Analyzing 121 plant genome sequences based on their annotation revealed strong splice site conservation across species, annotation errors, and true biological divergence from canonical splice sites. The frequency of non-canonical splice sites clearly correlates with their divergence from canonical ones indicating either an accumulation of probably neutral mutations, or evolution towards canonical splice sites. Strong conservation across multiple species and non-random accumulation of substitutions in splice sites indicate a functional relevance of non-canonical splice sites. The average composition of splice sites across all investigated species is 98.7% for GT-AG, 1.2% for GC-AG, 0.06% for AT-AC, and 0.09% for minor non-canonical splice sites. RNA-Seq data sets of 35 species were incorporated to validate non-canonical splice site predictions through gaps in sequencing reads alignments and to demonstrate the expression of affected genes.ConclusionWe conclude that bona fide non-canonical splice sites are present and appear to be functionally relevant in most plant genomes, although at low abundance.

Highlights

  • Most eukaryotic genes comprise exons and introns requiring the precise removal of introns from pre-mRNAs to enable protein biosynthesis

  • Based on terminal dinucleotides in introns, splice site combinations were classified as canonical (GT-AG) or non-canonical if they diverged from the canonical motif

  • The number of introns per gene was only slightly reduced to 4.15 when only introns enclosed by coding exons were considered for this analysis. Our investigation of these 121 plant genome sequences revealed a huge variety of different non-canonical splice site combinations (Additional files 6 and 7)

Read more

Summary

Introduction

Most eukaryotic genes comprise exons and introns requiring the precise removal of introns from pre-mRNAs to enable protein biosynthesis. Historical debates concerning the evolutionary history of introns led to the “introns-first-hypothesis” which proposes that introns were already present in the last common ancestor of all eukaryotes [3, 7] Pucker and Brockington BMC Genomics (2018) 19:980 The removal of these introns during pre-mRNA processing is a complex and expensive step, which involves 5 snoRNAs and over 150 proteins building the spliceosome [14]. Conserved cis-regulatory sequences are required for the correct spliceosome recruitment to designated splice sites [20,21,22] These sequences pose potential for deleterious mutations [4], some intron positions are conserved between very distant eukaryotic species like Homo sapiens and Arabidopsis thaliana [23]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call