Abstract

BackgroundGenomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.ResultsIn this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.ConclusionsWe conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.

Highlights

  • Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus

  • We present our work about classifying several genomic island datasets using supervised machine learning algorithms, and show that decision tree method perform better than other machine learning models including naive Bayesian, Bayesian networks, neural networks, simple logistic and support vector machines (SVMs) in general

  • In order to evaluate each of eight features, we define the signal to noise ratio (G2N) as the distance of the arithmetic means of the GI and non-GI classes divided by the sum of the corresponding standard deviations, i.e., G2N = | GI − non_GI |

Read more

Summary

Introduction

Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory. Genomic islands (GIs) are clusters of genes in a chromosome that are horizontally transferred from other organisms. Since different kinds of GIs have different genetic elements, and their sizes might range from 5-500 kilobase pairs, it is a challenging to accurately detect and characterize all GIs in any genome. With the explosive growth of fully sequenced genomes, the approach of using comparative genomics analysis to detect GIs becomes possible. Detecting GIs in such query genomes may not be applicable. Such methods may need manual selections of genomes

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call