Word correlation matrices for protein sequence analysis and remote homology detection

Thomas Lingner,Peter Meinicke

doi:10.1186/1471-2105-9-259

Thomas Lingner, Peter Meinicke

Open Access

https://doi.org/10.1186/1471-2105-9-259

Copy DOI

Journal: BMC bioinformatics	Publication Date: Jun 3, 2008
Citations: 42	License type: CC BY 2.0

Affiliation: University of Göttingen

Abstract

BackgroundClassification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive.ResultsIn this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection.ConclusionOur word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.

Highlights

Classification of protein sequences is a central problem in computational biology
We provide some further analysis of the associated sequence representation, which gives rise to a well interpretable feature space in terms of "word correlation matrices" (WCMs)
We presented a new approach for protein sequence representation based on word correlation matrices (WCM)

Summary

Introduction

Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. I.e. sequences with a similarity of more than 80% at the amino acid level, this can be done by pairwise comparison methods like the Smith-Waterman local alignment algorithm [1] or BLAST [2]. These methods often fail in cases where sequence similarity is low. Remote homology detection methods are often based on a statistical representation of protein families and can be divided into two major categories: first, profile-based methods provide a non-discriminative approach to family-specific representation of sequence properties.

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Word correlation matrices for protein sequence analysis and remote homology detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Remote homology detection based on oligomer distances
Thomas Lingner ... Peter Meinicke
Computer applications in the biosciences : CABIOS | VOL. 22
Thomas Lingner, et. al.Thomas Lingner ... Peter Meinicke
12 Jul 2006
Computer applications in the biosciences : CABIOS | VOL. 22

Detección de homología remota de proteínas usando modelos 3D enriquecidos con propiedades fisicoquímicas
Irene Tischer ... Oscar F Bedoya
INGENIERÍA Y COMPETITIVIDAD | VOL. 17
Irene Tischer, et. al.Irene Tischer ... Oscar F Bedoya
19 Jun 2015
INGENIERÍA Y COMPETITIVIDAD | VOL. 17

Remote homology detection incorporating the context of physicochemical properties
Oscar Bedoya ... Irene Tischer
Computers in Biology and Medicine | VOL. 45
Oscar Bedoya, et. al.Oscar Bedoya ... Irene Tischer
27 Nov 2013
Computers in Biology and Medicine | VOL. 45

Reducing dimensionality in remote homology detection using predicted contact maps
Oscar Bedoya ... Irene Tischer
Computers in Biology and Medicine | VOL. 59
Oscar Bedoya, et. al.Oscar Bedoya ... Irene Tischer
31 Jan 2015
Computers in Biology and Medicine | VOL. 59

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Word correlation matrices for protein sequence analysis and remote homology detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics