SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

Marina M-C Vidovic,Marius Kloft,Nico Görnitz,Gunnar Rätsch,Klaus-Robert Müller

doi:10.1371/journal.pone.0144782

Abstract

Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set.

Highlights

Major technological advances in sequencing techniques within the past decade have facilitated a deeper understanding of the mechanisms underlying the functionality and evolution of organisms
The main contributions of this work can be summarized as follows: 1. Advancing the work of [15] on positional oligomer importance matrices (POIMs), we propose a novel probabilistic framework to go the full way from the output of a state-of-the-art WD-kernel support vector machines (SVMs) via POIMs to the relevant motifs truly underlying the SVM predictions
To deal with the sheer exponentially large size of the feature space associated with the WD kernel, we propose a very efficient numerical framework based on numerous speed-ups such as bit-shift operations, highly efficient scalar multiplications as well as advanced sequence decomposition techniques, and provide a free open-source implementation thereof, which is available at https://github.com/mcvidomi/poim2motif.git

Summary

Introduction

Major technological advances in sequencing techniques within the past decade have facilitated a deeper understanding of the mechanisms underlying the functionality and evolution of organisms. Considering the pure size of a genome, it comes, at the expense of an PLOS ONE | DOI:10.1371/journal.pone.0144782. KRM is thankful for partial funding by the National Research Foundation of Korea (http:// www.nrf.re.kr/nrf_eng_cms/) funded by the Ministry of Education, Science, and Technology in the BK21 program and the German Ministry for Education and Research (http://www.bmbf.de/en/) as Berlin Big Data Center BBDC, funding mark 01IS14013A. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Dec 21, 2015
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Computation and Biology: A Joint Venture
Fran Lewitter
Cell | VOL. 105
Fran LewitterFran Lewitter
01 May 2001
Cell | VOL. 105

Investigation and prediction of users' sentiment toward food delivery apps applying machine learning approaches
Md Shamim Hossain ... Md Mehedul Islam Sabuj
Journal of Contemporary Marketing Science | VOL. 6
Md Shamim Hossain, et. al.Md Shamim Hossain ... Md Mehedul Islam Sabuj
17 May 2023
Journal of Contemporary Marketing Science | VOL. 6

Harnessing Consumer Wearable Digital Biomarkers for Individualized Recognition of Postpartum Depression Using the All of Us Research Program Data Set: Cross-Sectional Study.
Eric Hurwitz ... Melissa A Haendel
JMIR mHealth and uHealth | VOL. 12
Eric Hurwitz, et. al.Eric Hurwitz ... Melissa A Haendel
02 May 2024
JMIR mHealth and uHealth | VOL. 12

Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Yi Zhang ... Jianmin Jiang
Computational Intelligence and Neuroscience | VOL. 2015
Yi Zhang, et. al.Yi Zhang ... Jianmin Jiang
01 Jan 2015
Computational Intelligence and Neuroscience | VOL. 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one