Abstract

BackgroundExisting methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function.ResultsWe have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at .ConclusionThe proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.

Highlights

  • Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process

  • We presented a novel genomic-context based method capable of predicting gene functions from a large collection of genomes

  • Functional predictions are made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment

Read more

Summary

Introduction

Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. The increasing number of completely sequenced genomes has enabled gene function predictions by means of whole genome comparison Existing methods such as SynBrowse [1], Vista [2], LAGAN [3], PipMaker [4] and Ensembl SyntenyView [5] provide visualization of conserved regions between two or more genome sequences for comparative analysis. They cannot automatically detect homologous or functionally similar genes that share no sequence similarity, resulting in a need for manual prediction for those genes These methods require the genomes being compared to be closely related. This hinders the possibility of automatically analyzing a large collection of weakly related genomes and makes it impossible to inspect a genome to which related species have not been identified

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call