Abstract
The operon is a specific functional organization of genes found in bacterial genomes. Most genes within operons share common features. The support vector machine (SVM) approach is here used to predict operons at the genomic level. Four features were chosen as SVM input vectors: the intergenic distances, the number of common pathways, the number of conserved gene pairs and the mutual information of phylogenetic profiles. The analysis reveals that these common properties are indeed characteristic of the genes within operons and are different from that of non-operonic genes. Jackknife testing indicates that these input feature vectors, employed with RBF kernel SVM, achieve high accuracy. To validate the method, Escherichia coli K12 and Bacillus subtilis were taken as benchmark genomes of known operon structure, and the prediction results in both show that the SVM can detect operon genes in target genomes efficiently and offers a satisfactory balance between sensitivity and specificity.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have