Abstract

BackgroundIdentification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries.ResultsWe developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database.ConclusionsDomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses.

Highlights

  • Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes

  • It is calculated for each domain and inconsistencies in domain boundaries are evaluated as gaps so that the sum of the domain-specific sum-of-pairs (DSP) scores in the alignments of adjacent clusters reflects the quality of domain classification

  • In this study, we developed a method, DomRefine, to improve domain-level ortholog classification and applied the method to refine the ortholog classification created by our previous program, DomClust, using the proteome sets extracted from the Clusters of Orthologous Groups (COGs) and Microbial Genome Database for Comparative Analysis (MBGD) databases

Read more

Summary

Introduction

Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries. Identification of orthologs constitutes the basis for comparative analysis of multiple genomes. It provides a foundation for inferring the evolutionary history of genes and genomes and an important clue for inferring protein functions [1]. A reliable method for identifying ortholog groups among multiple genomes is needed for comparative analysis of this huge amount of microbial data. The prevalence of horizontal gene transfers (HGTs) makes accurate ortholog inference infeasible [8]. A relaxed condition, i.e., closest homologs in different species regardless of HGTs, is usually used as an alternative definition of orthology for prokaryotic genome comparison [4]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.