Learning Interpretable SVMs for Biological Sequence Classification

Gunnar Rätsch,Christin Schäfer,Sören Sonnenburg

doi:10.1186/1471-2105-7-s1-s9

Gunnar Rätsch, Christin Schäfer + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-7-s1-s9

Copy DOI

Abstract

BackgroundSupport Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight.ResultsWe propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination.ConclusionThe proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions.

Highlights

Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems
The main goal of this work is to provide an explanation of the SVM decision rule, for instance by identifying sequence positions that are important for discrimination
We show that our Multiple Kernel Learning (MKL) algorithm performs as well or slightly better than the standard SVM and leads to SVM classification functions that are computationally more efficient

Summary

Introduction

Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. It does not suffice that an algorithm just detects a biological signal in the sequence, but it should provide means to interpret its solution in order to gain biological insight Kernel based methods such as Support Vector Machines (SVMs) have proven to be powerful for sequence analysis problems frequently appearing in computational biology One of the problems with kernel methods compared to probabilistic methods (such as position weight matrices or interpolated Markov models [5]) is that the resulting decision function (1) is hard to interpret and, difficult to use in order to extract relevant biological knowledge from it (see [3,6]) We approach this problem by considering convex combinations of M kernels, i.e.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 1, 2006
Citations: 108	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Learning Interpretable SVMs for Biological Sequence Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Learning Interpretable SVMs for Biological Sequence Classification
S Sonnenburg ... C Schäfer
-
S Sonnenburg, et. al.S Sonnenburg ... C Schäfer
01 Jan 2004
01 Jan 2004

A New Algorithm for Solving Multiple Kernel Problem as SILP
Kan Li
-
Kan LiKan Li
01 Jan 2008
01 Jan 2008

Sample Adaptive Multiple Kernel Learning for Failure Prediction of Railway Points
Zhibin Li ... Jian Zhang
-
Zhibin Li, et. al.Zhibin Li ... Jian Zhang
25 Jul 2019
25 Jul 2019

Kernelized learning in deep scattering convolution networks
Yuehan Xiong ... Can Xu
-
Yuehan Xiong, et. al.Yuehan Xiong ... Can Xu
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Interpretable SVMs for Biological Sequence Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics