Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Bobbie-Jo M Webb-Robertson,Kyle G Ratuiste,Christopher S Oehmen

doi:10.1186/1471-2105-11-145

Bobbie-Jo M Webb-Robertson, Kyle G Ratuiste + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-11-145

Copy DOI

Abstract

BackgroundThe challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.ResultsWe introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost.ConclusionsA protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.

Highlights

The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level
Much of the research in the area of remote homology detection has focused on the use of machine learning algorithms, largely support vector machines (SVMs) to build protein family centric
We present a computationally streamlined implementation of SVM homology detection based on physicochemical distributions (SVM-PCD)

Summary

Introduction

The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. Much of the research in the area of remote homology detection has focused on the use of machine learning algorithms, largely support vector machines (SVMs) to build protein family centric predictive models leading to a large number of approaches [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]. For algorithms that are not as computationally demanding in the feature generation stage the average AUC values typically range from ~0.87 to ~0.9

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 19, 2010
Citations: 53	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers
Hilmi M Muda ... Razib M Othman
Computers in Biology and Medicine | VOL. 41
Hilmi M Muda, et. al.Hilmi M Muda ... Razib M Othman
25 Jun 2011
Computers in Biology and Medicine | VOL. 41

Reducing dimensionality in remote homology detection using predicted contact maps
Oscar Bedoya ... Irene Tischer
Computers in Biology and Medicine | VOL. 59
Oscar Bedoya, et. al.Oscar Bedoya ... Irene Tischer
31 Jan 2015
Computers in Biology and Medicine | VOL. 59

Remote homology detection incorporating the context of physicochemical properties
Oscar Bedoya ... Irene Tischer
Computers in Biology and Medicine | VOL. 45
Oscar Bedoya, et. al.Oscar Bedoya ... Irene Tischer
27 Nov 2013
Computers in Biology and Medicine | VOL. 45

Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties
Yuchen Yang ... Kuo-Bin Li
Journal of Theoretical Biology | VOL. 252
Yuchen Yang, et. al.Yuchen Yang ... Kuo-Bin Li
07 Feb 2008
Journal of Theoretical Biology | VOL. 252

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics