Domain Architecture Comparison for Multidomain Homology Identification

N Song,R.D Sedgewick,D Durand

doi:10.1089/cmb.2007.a009

N Song, R.D Sedgewick + Show 1 more

Open Access

https://doi.org/10.1089/cmb.2007.a009

Copy DOI

Abstract

Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the similarity of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of protein domain content. We developed several schemes for scoring the homology of a pair of protein sequences based on methods used in the field of information retrieval. We evaluate the proposed methods and methods used in the literature using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting promiscuous domains and of compensating for the statistical effect of having a large number of domains in a protein. Using logistic regression, we demonstrate the benefit of combining similarity measures based on domain content with sequence similarity measures.

Highlights

The need for accurate multi-domain homology identification is urgent
We focused on vertebrate data because the multi-domain families that challenge traditional homology identification methods tend to be larger and more complex in vertebrates (Aravind et al, 2001; Chothia et al, 2003; Patthy, 2003; Li et al, 2001; International Human Genome Sequencing Consortium, 2001; Venter et al, 2001; Wuchty, 2001; Ye and Godzik, 2004; Wuchty and Almaas, 2005)
We evaluate the weighted similarity measures, followed by a comparison of similarity and distance approaches to domain architecture comparison

Summary

Introduction

The need for accurate multi-domain homology identification is urgent. Multi-domain proteins represent a substantial fraction of the proteome: about 27% of proteins in bacteria and 39% of proteins in metazoa are multi-domain proteins (Tordai et al, 2005). These are proteins of particular functional importance. Complex multi-domain families are involved in cell-cell signaling, cellular adhesion, and cellular migration, functions crucial to the evolution of multicellularity (BenShlomo et al, 2003; Miyata and Suga, 2001). Since cancer typically arises from failures in signaling or apoptosis, most oncogenes are multi-domain

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational Biology	Publication Date: May 1, 2007
Citations: 68	License type: cc-by

R Discovery Prime

R Discovery Prime

Domain Architecture Comparison for Multidomain Homology Identification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computational Biology

Lead the way for us

Similar Papers

Domain Architecture in Homolog Identification
N Song ... D Durand
-
N Song, et. al.N Song ... D Durand
01 Jan 2006
01 Jan 2006

A statistical method for alignment-free comparison of regulatory sequences
Miriam R Kantorovitz ... Gene E Robinson
Bioinformatics | VOL. 23
Miriam R Kantorovitz, et. al.Miriam R Kantorovitz ... Gene E Robinson
01 Jul 2007
Bioinformatics | VOL. 23

SSM-DENCLUE : Enhanced Approach for Clustering of Sequential Data: Experiments and Test Cases
K Santhisree
International Journal of Computer Applications | VOL. 96
K SanthisreeK Santhisree
18 Jun 2014
International Journal of Computer Applications | VOL. 96

Discrepancy-Based Method for Hierarchical Distributed Optimization
Jonathan Gaudreault ... Jean-Marc Frayret
-
Jonathan Gaudreault, et. al.Jonathan Gaudreault ... Jean-Marc Frayret
01 Oct 2007
01 Oct 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Domain Architecture Comparison for Multidomain Homology Identification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computational Biology