Domainoid: domain-oriented orthology inference

Emma Persson,Mateusz Kaduk,Erik L L Sonnhammer,Sofia K Forslund

doi:10.1186/s12859-019-3137-2

Abstract

BackgroundOrthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains.ResultsThis domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark.ConclusionsOur results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches.Availabilityhttps://bitbucket.org/sonnhammergroup/domainoid/

Highlights

Orthology inference is normally based on full-length protein sequences
We analyzed the resulting orthologous domains for discordant domain orthologs, where different domains on the same protein have different evolutionary histories. They are used to identify orthologs on the full protein sequence level, and we show that Domainoid can find domain orthologs that are not detectable by full-length approaches
Discordant Orthology In order to assess the usefulness of orthologous domains generated when running InParanoid on the domain equivalent sequences, we looked for cases where the domain-based approach for orthology inference resulted in orthologous pairs not found by the full-length approach

Summary

Introduction

Orthology inference is normally based on full-length protein sequences. most proteins contain independently folding and recurring regions, domains. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. To apply a domain-aware approach for orthology inference one can either use an unsupervised algorithm for domain detection [12] or employ a domain dictionary such as Pfam [13] to divide sequences into domains before the orthology inference No matter how this is done, the crucial algorithmic change is to treat domains rather than full-length proteins as the operating objects for the algorithm. This growing interest in domain-aware methods highlights the importance of accounting for diversity in domain architectures among orthologs

Methods

Results

Discussion

Conclusion