Abstract

BackgroundGene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations.ResultsWe developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions.ConclusionOur procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods.

Highlights

  • Introduction of bootstrapping alternatives likeBayesian statistics for ortholog inference [51] will be tested and implemented in the release of the pipeline GreenPhyl as we did not yet evaluated this strategy for genome-wide detection of orthologs

  • Phylogenomic predictions for all validated gene families in both species were achieved and we can conclude that our methodology outperforms based methods

  • Trichocarpa gene clusters developed with the TribeMCL software [24]. This method relies on the Markov cluster (MCL) algorithm for the assignment of proteins to families based on pre-computed sequence similarity information

Read more

Summary

Introduction

Introduction of bootstrapping alternatives likeBayesian statistics for ortholog inference [51] will be tested and implemented in the release of the pipeline GreenPhyl as we did not yet evaluated this strategy for genome-wide detection of orthologs. Gene tree construction is very sensitive to annotation errors and the efficiency of our phylogenomics pipeline is mainly due to the sequences used and the alignment filtering steps. They reject non-homologous or dissimilar sequences before construction of the gene family alignment. Thaliana [1], P. trichocharpa [2] and Vitis vinifera [3]), one monocotyledon (O. sativa) [4] and a moss (Physcometrilla patens [5]) have been fully sequenced The comparison of their gene repertories will help to formulate hypotheses either on conservation or divergence for biological process among several species [6]. Annotation transfer from model species will be the only way to assign a function to the majority of genes in these species

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.