Abstract

One of the most common splice variations are small exon length variations caused by the use of alternative donor or acceptor splice sites that are in very close proximity on the pre-mRNA. Among these, three-nucleotide variations at so-called NAGNAG tandem acceptor sites have recently attracted considerable attention, and it has been suggested that these variations are regulated and serve to fine-tune protein forms by the addition or removal of a single amino acid. In this paper we first show that in-frame exon length variations are generally overrepresented and that this overrepresentation can be quantitatively explained by the effect of nonsense-mediated decay. Our analysis allows us to estimate that about 50% of frame-shifted coding transcripts are targeted by nonsense-mediated decay. Second, we show that a simple physical model that assumes that the splicing machinery stochastically binds to nearby splice sites in proportion to the affinities of the sites correctly predicts the relative abundances of different small length variations at both boundaries. Finally, using the same simple physical model, we show that for NAGNAG sites, the difference in affinities of the neighboring sites for the splicing machinery accurately predicts whether splicing will occur only at the first site, splicing will occur only at the second site, or three-nucleotide splice variants are likely to occur. Our analysis thus suggests that small exon length variations are the result of stochastic binding of the spliceosome at neighboring splice sites. Small exon length variations occur when there are nearby alternative splice sites that have similar affinity for the splicing machinery.

Highlights

  • Anticipating that the sequencing and initial annotation of the human [1,2] and mouse [3] genomes will not be able to uncover all the complexities of mammalian gene structures, several groups have focused on producing high-quality, annotated transcript data such as the Riken Clone Collection [4,5], the Mammalian Gene Collection [6], the NCBI Reference Sequence [7], and the NCBI unfinished high-throughput cDNA sequences

  • The choice of splice site appears to be determined by a combination of (1) the strength of the splice signal, i.e., the affinity for the splicing machinery of the sequence around the splice site, (2) structural constraints set on the interactions of spliceosomal components by the lengths and sequences of introns and exons and possibly by the secondary structure of the mRNA, (3) the presence of enhancer or repressor elements that may serve, respectively, to activate a weak splice site or repress a strong one, and (4) the effective concentrations of splicing factors such as SR proteins and heterogeneous nuclear ribonucleoproteins that can be regulated through post-translational modifications such as phosphorylation [15]

  • We investigated to what extent we could predict the category of each NAGNAG site by using the weight matrix (WM) constructed from the invariant acceptor sites

Read more

Summary

Introduction

Anticipating that the sequencing and initial annotation of the human [1,2] and mouse [3] genomes will not be able to uncover all the complexities of mammalian gene structures, several groups have focused on producing high-quality, annotated transcript data such as the Riken Clone Collection [4,5], the Mammalian Gene Collection [6], the NCBI Reference Sequence [7], and the NCBI unfinished high-throughput cDNA sequences. The choice of splice site appears to be determined by a combination of (1) the strength of the splice signal, i.e., the affinity for the splicing machinery of the sequence around the splice site, (2) structural constraints set on the interactions of spliceosomal components by the lengths and sequences of introns and exons and possibly by the secondary structure of the mRNA, (3) the presence of enhancer or repressor elements that may serve, respectively, to activate a weak splice site or repress a strong one, and (4) the effective concentrations of splicing factors such as SR proteins and heterogeneous nuclear ribonucleoproteins that can be regulated through post-translational modifications such as phosphorylation [15]. The high estimates of the frequency of alternative splicing in human [8,14,18] and mouse [13] genes raise the question to Editors: Judith Blake (The Jackson Laboratory, US), John Hancock (MRC-Harwell, UK), Bill Pavan (NHGRI-NIH, US), and Lisa Stubbs (Lawrence Livermore National Laboratory, US), together with PLoS Genetics EIC Wayne Frankel (The Jackson Laboratory, US)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call