Abstract

In Kinetoplastids, protein-coding genes are transcribed polycistronically by RNA polymerase II. Individual mature mRNAs are generated from polycistronic precursors by 5′ trans splicing of a 39-nt capped leader RNA and 3′ polyadenylation. It was previously known that trans splicing generally occurs at an AG dinucleotide downstream of a polypyrimidine tract, and that polyadenylation is coupled to downstream trans splicing. The few polyadenylation sites that had been examined were 100–400 nt upstream of the polypyrimidine tract which marked the adjacent trans splice site. We wished to define the sequence requirements for trypanosome mRNA processing more tightly and to generate a predictive algorithm. By scanning all available Trypanosoma brucei cDNAs for splicing and polyadenylation sites, we found that trans splicing generally occurs at the first AG following a polypyrimidine tract of 8–25 nt, giving rise to 5′-UTRs of a median length of 68 nt. We also found that in general, polyadenylation occurs at a position with one or more A residues located between 80 and 140 nt from the downstream polypyrimidine tract. These data were used to calibrate free parameters in a grammar model with distance constraints, enabling prediction of polyadenylation and trans splice sites for most protein-coding genes in the trypanosome genome. The data from the genome analysis and the program are available from: http://web.cgb.ki.se/daniel/splicemodel.php.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call