Abstract

The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3′ end of introns, with distance to the 3′ splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.

Highlights

  • Pre-mRNA splicing, which is essential for the production of functional mRNAs, is a co-transcriptional set of reactions catalyzed by a large ribonucleoprotein complex – the spliceosome – composed by five small nuclear RNAs and more than hundred proteins [1,2]

  • We have developed a methodology for mammalian branch point prediction based on a machine-learning algorithm, which shows improved accuracy over previous published methods

  • These findings might prove useful for a better understanding of how splicing-associated mutations can lead to disease

Read more

Summary

Introduction

Pre-mRNA splicing, which is essential for the production of functional mRNAs, is a co-transcriptional set of reactions catalyzed by a large ribonucleoprotein complex – the spliceosome – composed by five small nuclear RNAs (snRNAs) and more than hundred proteins [1,2] In addition to these core factors, splicing is often dependent on other proteins that can either activate or repress signal recognition, playing a very important role in the regulation of specific events [3,4]. DBPs have typically an adjacent long PPT downstream and have been associated with AS, in particular with mutually exclusive exons [21,23] For both distant and proximal BPs, the region between the BP and the 3SS is usually devoid of AG dinucleotides [22] (Figure 1).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call