Abstract

Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.

Highlights

  • Gene orthology inference is a central problem in genomics and comparative biology (Koonin 2005)

  • We present Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]), a flexible and accurate tool to identify pairs and clusters of orthologous genes from precomputed phylogenies, and obtain inclusive gene family classifications

  • As the species overlap algorithm relies on the implicit taxonomic information contained in the gene tree’s topology, this approach is suitable for cases where the species tree is unknown or unavailable

Read more

Summary

Introduction

Gene orthology inference is a central problem in genomics and comparative biology (Koonin 2005). We measured accuracy using all orthogroups containing a majority of genes from the reference families This more inclusive metric results in higher recall without a detrimental effect on precision (supplementary material S3, Supplementary Material online). The orthogroups that can be inferred the pairwise orthologies available in PhylomeDB, based on the species overlap algorithm but lacking a taxonomically unbiased clustering step (HuertaCepas et al 2014), are precise but have lower recall. We have evaluated the effect of the iterative tree rooting strategy on orthology inference This rooting heuristics often improved recall in a simulated set of gene trees with severe long-branch artifacts, albeit at the cost of occasional lower precision due to overclustering The precision of pairwise orthology relationships within each orthogroup would be unaffected by the rooting strategy

C Precision
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call