Abstract

Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST “all-against-all” methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing time); and proteinOrtho (developed for dealing with large amounts of biological data). We made a comparison among the main features of four tool and tested them using four for prokaryotic genomas. We hope that our review can be useful for researchers and will help them in selecting the most appropriate tool for their work in the field of orthology.

Highlights

  • Finding the homology relationship between sequences is an essential step for biological research

  • An accurate orthology recognition is an essential step for comparative genomic researches (Petersen et al, 2017) and in some cases, there is a need for tools that analyze closely related species by pangenomas (Fouts et al, 2012) or for the creation of tools that use different strategies like the post-translational modifications proteins (PTMs) for a better orthology inference (Chaudhuri et al, 2015)

  • BLAST tool and adaptations were consolidated in subsequent studies (Kristensen et al, 2011) resulting in several publications and in the creation of large biological databases containing Ortholog Clusters such as the Clusters of Orthologs Groups/euKaryotic Orthologous Groups (COG/KOG), Ortholog Data Bank (OrthoDB), and eggNOG (Kuzniar et al, 2008)

Read more

Summary

Introduction

Finding the homology relationship between sequences is an essential step for biological research. This tool is divided in two main steps: (1) The ReMark detects and recursively ortholog clusters through reciprocal BLAST best hits between multiple genomes running software program (RecursiveClustering.java). The local version has not been updated since March 2014 and it requires pre-installed software such as python, biopython, networkx, gnuplot, and BLAST + besides the need of the registration of the user via Web. OrthAgogue is a tool that was developed in order to predict orthology among large sets of data.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call