Abstract

BackgroundSupport Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight.ResultsWe propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination.ConclusionThe proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions.

Highlights

  • Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems

  • The main goal of this work is to provide an explanation of the SVM decision rule, for instance by identifying sequence positions that are important for discrimination

  • We show that our Multiple Kernel Learning (MKL) algorithm performs as well or slightly better than the standard SVM and leads to SVM classification functions that are computationally more efficient

Read more

Summary

Introduction

Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. It does not suffice that an algorithm just detects a biological signal in the sequence, but it should provide means to interpret its solution in order to gain biological insight Kernel based methods such as Support Vector Machines (SVMs) have proven to be powerful for sequence analysis problems frequently appearing in computational biology One of the problems with kernel methods compared to probabilistic methods (such as position weight matrices or interpolated Markov models [5]) is that the resulting decision function (1) is hard to interpret and, difficult to use in order to extract relevant biological knowledge from it (see [3,6]) We approach this problem by considering convex combinations of M kernels, i.e.

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.