Abstract

Annotations of the genes and their products are largely guided by inferring homology. Sequence similarity is the primary measure used for annotation purpose however, the domain content and order were given less importance albeit the fact that domain insertion, deletion, positional changes can bring in functional varieties. Of late, several methods developed quantify domain architecture similarity depending on alignments of their sequences and are focused on only homologous proteins. We present an alignment-free domain architecture-similarity search (ADASS) algorithm that identifies proteins that share very poor sequence similarity yet having similar domain architectures. We introduce a “singlet matching-triplet comparison” method in ADASS, wherein triplet of domains is compared with other triplets in a pair-wise comparison of two domain architectures. Different events in the triplet comparison are scored as per a scoring scheme and an average pairwise distance score (Domain Architecture Distance score - DAD Score) is calculated between protein domains architectures. We use domain architectures of a selected domain termed as centric domain and cluster them based on DAD score. The algorithm has high Positive Prediction Value (PPV) with respect to the clustering of the sequences of selected domain architectures. A comparison of domain architecture based dendrograms using ADASS method and an existing method revealed that ADASS can classify proteins depending on the extent of domain architecture level similarity. ADASS is more relevant in cases of proteins with tiny domains having little contribution to the overall sequence similarity but contributing significantly to the overall function.

Highlights

  • Most of the classifications of multi-domain proteins are basedOrganisms have inherent tendency to innovate and create new on the sequence similarity between the characteristic functional proteins and pathways by gene duplication [1], fusion and domains which are common between the proteins[8]

  • alignment-free domain architecture-similarity search (ADASS) scores the domain architecture pairs in such a way that those differing by few domains acquire a low Domain Architecture Distance (DAD) score where as those differing by many domains either in number or in order acquire a high DAD score

  • Results & Discussion: DAD score distinguishes homologues from non-homologues in more functionally relevant fashion than sequence based approach Domain architectures, from protein families – Pkinases and Helicases, constituting dataset1 were selected in such a way that the families were mutually exclusive in terms of their domain contents (Figure 3a)

Read more

Summary

Introduction

Most of the classifications of multi-domain proteins are basedOrganisms have inherent tendency to innovate and create new on the sequence similarity between the characteristic functional proteins and pathways by gene duplication [1], fusion and domains which are common between the proteins[8]. Fission[2] through mechanisms like recombination operating at multi-domain proteins having high sequence similarity in the the genomic level[3]. This has resulted in a multitude of protein characteristic domain could still differ functionally, due to the domain architectures having diverse functions within and presence of different associated domains. Due to the Though this strategy works well in the case of single domain differences in length of the associated domains the contribution proteins, sequence identity would not be sufficient to of these domains to the overall sequence similarity of the distinguish between homologues of multi domain proteins.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call