Abstract

String kernels directly model sequence similarities without the necessity of extracting numerical features in a vector space. Since they better capture complex traits in the sequences, string kernels often achieve better prediction performance. RNA interference is a cell defense mechanism with many biological and therapeutical applications, where strings can be used to represent target messenger RNAs and initiating short RNAs and string kernels can be applied for training and prediction. While most existing string kernels are developed for general purpose sequences and have been applied to text and protein classifications, the RNA string kernel is particularly designed to model mismatches, GU wobbles, and bulges of RNA biology and has been applied to RNAi off-target evaluation. We adapt the RNA string kernel to compute the similarity of siRNA sequences and use it in support vector regression to predict siRNA silencing efficacy. We evaluate the performance of the RNA kernel against the spectrum kernel, the string subsequence kernel of arbitrary mismatch, the randomized string kernel, and numerical kernels computed from numerical features extracted according to siRNA design rules. We also give insights into computational performance and common properties and differences of the RNA kernel as compared to other kernels. Empirical results on biological data sets demonstrate that the RNA string kernel performed favorably than most existing string kernels and achieved significant improvements over kernels computed from numerical descriptors extracted according to structural and thermodynamic rules. Meanwhile, the string kernels achieved favorable results relative to other methods in related work. Furthermore, the RNA string kernel is simple to implement and fast to compute.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call