Abstract

Motivation: Gene identification in genomes has been a fundamental and long-standing task in bioinformatics and computational biology. Many computational methods have been developed to predict genes in prokaryote genomes by identifying translation initiation site (TIS) in transcript data. However, the pseudo-TISs at the genome level make these methods suffer from a high number of false positive predictions. In addition, most of the existing tools use an unsupervised learning framework, whose predictive accuracy may depend on the choice of specific organism. Results: In this paper, we present a supervised learning method, support vector machine (SVM), to identify translation initiation site at the genome level. The features are extracted from the sequence data by modeling the sequence segment around predicted TISs as a position specific weight matrix (PSWM). We train the parameters of our SVM through well constructed positive and negative TIS datasets. Then we apply the method to recognize translation initiation sites in E. coli, B. subtilis, and validate our method on two GC-rich bacteria genomes: Pseudomonas aeruginosa and Burkholderia pseudomallei K96243. We show that translation initiation sites can be recognized accurately at the genome level by our method, irrespective of their GC content. Furthermore, we compare our method with four existing methods and demonstrate that our method outperform these methods by obtaining better performance in all the four organisms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.