Exploiting physico-chemical properties in string kernels

Nora C Toussaint,Gunnar Rätsch,Oliver Kohlbacher,Christian Widmer

doi:10.1186/1471-2105-11-s8-s7

Nora C Toussaint, Gunnar Rätsch + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-11-s8-s7

Copy DOI

Abstract

BackgroundString kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas.ResultsWe propose new string kernels that combine the benefits of physico-chemical descriptors for amino acids with the ones of string kernels. The benefits of the proposed kernels are assessed on two problems: MHC-peptide binding classification using position specific kernels and protein classification based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels.ConclusionsIn summary, the proposed modifications, in particular the combination with the RBF substring kernel, consistently yield improvements without affecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference.AvailabilityData sets, code and additional information are available from http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask. Implementations of the developed kernels are available as part of the Shogun toolbox.

Highlights

String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences
Availability: Data sets, code and additional information are available from http://www.fml.tuebingen.mpg.de/ raetsch/suppl/aask
The main goal of this work is the methodological improvement of existing string kernels by incorporation of prior knowledge on amino acids (AAs) properties

Summary

Introduction

String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. String kernels are already very powerful, when it comes to amino acids they have a major short coming They ignore an important piece of information when comparing amino acids: the physicochemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. String kernels for sequence classification Kernels that have been proposed for classifying nucleic and amino acids can be divided into two main classes: (a) kernels describing the sequence content of sequences of varying length and (b) kernels for identifying localized signals within sequences of fixed length. Kernels describing l-mer content The so-called spectrum kernel was first proposed for classifying protein sequences [11]:

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 1, 2010
Citations: 55	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Exploiting physico-chemical properties in string kernels

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Biological Sequence Classification with Multivariate String Kernels
Pavel P Kuksa
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 10
Pavel P KuksaPavel P Kuksa
01 Sep 2013
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 10

Machine learning applied in natural language processing
Andrei-Mădălin Butnaru
ACM SIGIR Forum | VOL. 54
Andrei-Mădălin ButnaruAndrei-Mădălin Butnaru
01 Jun 2020
ACM SIGIR Forum | VOL. 54

Generalized Similarity Kernels for Efficient Sequence Classification
...
-
, et. al. ...
01 Dec 2012
01 Dec 2012

String kernels for the classification of speech data
John Ch Goddard Close ... Alma E Martinez Licona
The Journal of the Acoustical Society of America | VOL. 112
John Ch Goddard Close, et. al.John Ch Goddard Close ... Alma E Martinez Licona
25 Oct 2002
The Journal of the Acoustical Society of America | VOL. 112

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting physico-chemical properties in string kernels

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics