Abstract

A functional comparative genome analysis is essential to understand the mechanisms underlying bacterial evolution and adaptation. Detection of functional orthologs using standard global sequence similarity methods faces several problems; the need for defining arbitrary acceptance thresholds for similarity and alignment length, lateral gene acquisition and the high computational cost for finding bi-directional best matches at a large scale. We investigated the use of protein domain architectures for large scale functional comparative analysis as an alternative method. The performance of both approaches was assessed through functional comparison of 446 bacterial genomes sampled at different taxonomic levels. We show that protein domain architectures provide a fast and efficient alternative to methods based on sequence similarity to identify groups of functionally equivalent proteins within and across taxonomic bounderies. As the computational cost scales linearly, and not quadratically with the number of genomes, it is suitable for large scale comparative analysis. Running both methods in parallel pinpoints potential functional adaptations that may add to bacterial fitness.

Highlights

  • Comparative analysis of genome sequences has been pivotal to unravel mechanisms shaping bacterial evolution like gene duplication, loss and acquisition[1,2], and helped in shedding light on pathogenesis and genotype-phenotype associations[3,4].Comparative analysis relies on the identification of sets of orthologous and paralogous genes and subsequent transfer of function to the encoding proteins

  • SB and domain-based clustering of orthologous groups (DAB) approaches were compared by considering eight sets of genome sequences sampled at different taxonomic levels, from species to order, preserving phylogenetic diversity

  • We have shown that domain architecture-based methods can be used as an effective approach to identify clusters of functionally equivalent proteins, leading to results similar to those obtained by classical methods based on sequence similarity

Read more

Summary

Introduction

Comparative analysis relies on the identification of sets of orthologous and paralogous genes and subsequent transfer of function to the encoding proteins. Orthologs are defined as best bi-directional hits (BBH) obtained via pairwise sequence comparison among multiple species and exploits sequence similarity for functional grouping. A generalized minimal alignment length and similarity cut-off need to be arbitrarily selected for all, which may hamper proper functional grouping. Protein sequences change faster than protein structure and proteins with same function but with low sequence similarity have been identified[5,6]. SB methods may fail to group them hampering a functional comparison. This limitation becomes even more critical when comparing either phylogenetically distant genomes or gene sequences that were acquired with horizontal gene transfer events

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call