Abstract

BackgroundThe general method used to determine the function of newly discovered proteins is to transfer annotations from well-characterized homologous proteins. The process of selecting homologous proteins can largely be classified into sequence-based and domain-based approaches. Domain-based methods have several advantages for identifying distant homology and homology among proteins with multiple domains, as compared to sequence-based methods. However, these methods are challenged by large families defined by 'promiscuous' (or 'mobile') domains.ResultsHere we present a measure, called Weighed Domain Architecture Comparison (WDAC), of domain architecture similarity, which can be used to identify homolog of multidomain proteins. To distinguish these promiscuous domains from conventional protein domains, we assigned a weight score to Pfam domain extracted from RefSeq proteins, based on its abundance and versatility. To measure the similarity of two domain architectures, cosine similarity (a similarity measure used in information retrieval) is used. We combined sequence similarity with domain architecture comparisons to identify proteins belonging to the same domain architecture. Using human and nematode proteomes, we compared WDAC with an unweighted domain architecture method (DAC) to evaluate the effectiveness of domain weight scores. We found that WDAC is better at identifying homology among multidomain proteins.ConclusionOur analysis indicates that considering domain weight scores in domain architecture comparisons improves protein homology identification. We developed a web-based server to allow users to compare their proteins with protein domain architectures.

Highlights

  • The general method used to determine the function of newly discovered proteins is to transfer annotations from well-characterized homologous proteins

  • We developed a web-based server to allow users to compare their proteins with protein domain architectures

  • The results show that the number of true positive values in the Weighed Domain Architecture Comparison (WDAC) and domain architecture method (DAC) results are 2,328 (91%) and 2,175 (85%) respectively, which means that considering weight scores in domain architecture comparison can improve homology identification

Read more

Summary

Introduction

The general method used to determine the function of newly discovered proteins is to transfer annotations from well-characterized homologous proteins. Domain-based methods have several advantages for identifying distant homology and homology among proteins with multiple domains, as compared to sequence-based methods. Current methods for the identification of homologous proteins can be largely classified into sequence-based and domain-based approaches [3]. Sequence comparison methods, such as BLAST and FASTA, are the commonly-used traditional approaches to identify homologous genes [4,5]. These methods assume that sequences with significant similarity share common ancestry, i.e. are homologs. The existence of multi-domain proteins and complex evolutionary mechanisms poses difficulties for sequence-based methods [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call