Abstract
The termination of mature eukaryotic mRNAs occurs at specific polyadenylation sites located downstream from stop codons in the 3'-untranslated region (UTR). An accurate delineation of these sites is essential for the study of 3'-UTR-based gene regulation and for the design of pertinent probes for transcriptome analysis. Although typical poly(A) sites are located between 0 and 2 kb from the stop codon, EST sequence analyses have identified sites located at unexpectedly long ranges (5-10 kb) in a number of genes. Here we perform a complete mapping of EST and full-length cDNA sequences on the mouse and human genome to observe putative poly(A) sites extending beyond annotated 3'-ends and into the intergenic regions. We introduce several quality parameters for poly(A) site prediction and train a classification tree to associate P-values to predicted sites. We observe a higher than background level of high-scoring sites up to 12-15 kb past the stop codon, both in human and mouse. This leads to an estimate of about 5000 human genes having unreported 3'-end extensions and about 3500 novel polyadenylated transcripts lying in present "intergenic" regions. These high-scoring, long-range poly(A) sites corresponding to novel transcripts and gene extensions should be incorporated into current human and mouse gene repositories.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.