Abstract

Proximity-based methods and co-evolution-based phylogenetic profiles methods have been successfully used for the identification of functionally related genes. Proximity-based methods are effective for physically clustered genes while the phylogenetic profiles method is effective for co-occurring gene sets. However, both methods predict many false positives and false negatives. In this paper, we propose the Gene Cluster Profile Vector (GCPV) method, which combines these two methods by using phylogenetic profiles of whole gene clusters. Moreover, the GCPV method is, currently, the only method that allows for the characterization of relationships between gene clusters themselves. The GCPV method groups together reasonably related operons in E. coli about 60% of the time. The method is minimally dependent on the reference genome set used and it outperforms the conventional phylogenetic profiles method. Finally, we show that the method works well for predicted gene clusters from C. crescentus and can serve as an important tool not only for understanding gene function, but also for elucidating mechanisms of general biological processes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call