Abstract
Essential genes are those genes that are needed by organisms at any time and under any conditions. It is very important for us to identify essential genes from bacterial genomes because of their vital role in synthetic biology and biomedical practices. In this paper, we developed a support vector machine (SVM)-based method to predict essential genes of bacterial genomes using only compositional features. These features are all derived from the primary sequences, i.e., nucleotide sequences and protein sequences. After training on the multiple samplings of the labeled (essential or not essential) features using a library for SVM, we obtained an average area under the ROC curve (AUC) of about 0.82 in a 5-fold cross-validation for Escherichia coli and about 0.74 for Mycoplasma pulmonis. We further evaluated the performance of the method proposed using the dataset consisting of 16 bacterial genomes, and an average AUC of 0.76 was achieved. Based on this training dataset, a model for essential gene prediction was established. Another two independent genomes, Shewanella oneidensis RW1 and Salmonella enterica serovar Typhimurium SL1344 were used to evalutate the model. Results showed that the AUC sores were 0.77 and 0.81, respectively. For the convenience of the vast majority of experimental scientists, a web server has been constructed, which is freely available at http://cefg.uestc.edu.cn:9999/egp.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.