Abstract
The performance of gene finding from genome sequences strongly depends on the accuracy of splice site prediction. Recent gene finding programs, however, still do not reach enough levels. To improve the accuracy of splice site prediction, it is required to understand the splicing mechanism and to make a model from clear experimental evidences. For this purpose, genomic full-length precursor mRNA sequences (FL-pre-mRNAs), together with expression information are indispensable. The FL-pre-mRNAs have entire gene structure such as the 5’ and 3’ end of mRNA, initiation codon, splice sites, stop codon, and polyadenylation signals, etc. They also contain all the alternative splice sites except the first or last exons in alternative transcripts. However, databases of FL-pre-mRNAs are still not reported in previous works. Aligning expressed sequence tags (ESTs) to the genomic sequences has been a common method for gene prediction or splice site analysis (1, 3). However, ESTs are not suitable for collecting FL-premRNAs because ESTs are partial sequences and the 5’ ends of mRNAs are unknown in most cases, and even EST contigs clustered in UniGene (2) or RefSeq database (4) are not evident to be full-length. It is because ESTs are single sequencing reads that contain mutations, insertions, or deletions (5). Growing genomic and EST sequence data, computational approach has become one of methods to annotate the sequences as putative genes or ORFs. Whereas, Genbank database has accumulated the entries in which genomic complete protein-coding sequences or full-length mRNA sequences are characterized by experimental evidence. The sequences and the annotation (the positions of gene boundaries and functional signals) with the information more reliable than that determined by in silico prediction are expected to be high quality. Thus, we constructed datasets with experimental annotation from Genbank database for gene structure prediction and splice site analysis. Moreover, the analysis for constitutive and alternative splice sites with the correlation with several biological descriptors will be discussed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.