Abstract
BackgroundAccurate and automatic gene finding and structural prediction is a common problem in bioinformatics, and applications need to be capable of handling non-canonical splice sites, micro-exons and partial gene structure predictions that span across several genomic clones.ResultsWe present a mRNA/DNA homology based gene structure prediction tool, GIGOgene. We use a new affine gap penalty splice-enhanced global alignment algorithm running in linear memory for a high quality annotation of splice sites. Our tool includes a novel algorithm to assemble partial gene structure predictions using interval graphs. GIGOgene exhibited a sensitivity of 99.08% and a specificity of 99.98% on the Genie learning set, and demonstrated a higher quality of gene structural prediction when compared to Sim4, est2genome, Spidey, Galahad and BLAT, including when genes contained micro-exons and non-canonical splice sites. GIGOgene showed an acceptable loss of prediction quality when confronted with a noisy Genie learning set simulating ESTs.ConclusionGIGOgene shows a higher quality of gene structure prediction for mRNA/DNA spliced alignment when compared to other available tools.
Highlights
Accurate and automatic gene finding and structural prediction is a common problem in bioinformatics, and applications need to be capable of handling non-canonical splice sites, microexons and partial gene structure predictions that span across several genomic clones
This study shows that the GIGOgene program has the highest structural prediction sensitivity and specificity in this case
Using a homology-based approach, we have designed a program for eukaryotic gene structural annotation
Summary
Experiments with Genie learning set GIGOgene was tested, along with Spidey, est2genome, Sim, Galahad and BLAT on 462 mRNA transcripts of the human Genie multi-exon annotated learning set http://. If we get a number of nucleotide inserts between exon boundaries in mRNA, they can be interpreted as micro-exon(s) with non-canonical splice sites, rather than reinforcing the GT-AG rule in a genomic clone as Sim and EST2genome do. That is why these two applications have rather poor performance in micro-exonic testing [see Subsection Experiments with micro-exon detection], where they sacrifice micro-exons to reinforce canonical splice rule. Run-time comparison In Table 5 we compare running time for different programs required to annotate the set of micro-exon containing genes mentioned [see Subsection Experiments with micro-exon detection]. Results of BLAT and GIGOgene comparison on Chromosome 22 whole draft sequence annotation agree well with the previously observed tendency: with GIGOgene, gene structural prediction takes longer, compared to BLAT, and has higher prediction quality
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.