Abstract

BackgroundPredicting and proper ranking of canonical splice sites (SSs) is a challenging problem in bioinformatics and machine learning communities. Any progress in SSs recognition will lead to better understanding of splicing mechanism. We introduce several new approaches of combining a priori knowledge for improved SS detection. First, we design our new Bayesian SS sensor based on oligonucleotide counting. To further enhance prediction quality, we applied our new de novo motif detection tool MHMMotif to intronic ends and exons. We combine elements found with sensor information using Naive Bayesian Network, as implemented in our new tool SpliceScan.ResultsAccording to our tests, the Bayesian sensor outperforms the contemporary Maximum Entropy sensor for 5' SS detection. We report a number of putative Exonic (ESE) and Intronic (ISE) Splicing Enhancers found by MHMMotif tool. T-test statistics on mouse/rat intronic alignments indicates, that detected elements are on average more conserved as compared to other oligos, which supports our assumption of their functional importance. The tool has been shown to outperform the SpliceView, GeneSplicer, NNSplice, Genio and NetUTR tools for the test set of human genes. SpliceScan outperforms all contemporary ab initio gene structural prediction tools on the set of 5' UTR gene fragments.ConclusionDesigned methods have many attractive properties, compared to existing approaches. Bayesian sensor, MHMMotif program and SpliceScan tools are freely available on our web site.ReviewersThis article was reviewed by Manyuan Long, Arcady Mushegian and Mikhail Gelfand.

Highlights

  • Predicting and proper ranking of canonical splice sites (SSs) is a challenging problem in bioinformatics and machine learning communities

  • Reviewers: This article was reviewed by Manyuan Long, Arcady Mushegian and Mikhail Gelfand

  • Learning and test sets cross-correlation degree study We study Bayesian sensor performance depending on the degree of cross-correlation between the learning and test sets

Read more

Summary

Introduction

Predicting and proper ranking of canonical splice sites (SSs) is a challenging problem in bioinformatics and machine learning communities. Any progress in SSs recognition will lead to better understanding of splicing mechanism. We introduce several new approaches of combining a priori knowledge for improved SS detection. We design our new Bayesian SS sensor based on oligonucleotide counting. To further enhance prediction quality, we applied our new de novo motif detection tool MHMMotif to intronic ends and exons. We combine elements found with sensor information using Naive Bayesian Network, as implemented in our new tool SpliceScan. Precise removal of introns from pre-messenger RNAs (premRNAs) by splicing is a critical step in expression of most metazoan genes. The process requires accurate recognition and pairing of 5' and 3' SSs by the splicing machinery.

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.