Abstract

This paper focuses on the use of support vector machines on a typical context-dependent classification task, splice site prediction. For this type of problems, it has been shown that a context-based approach should be preferred over a transfor- mation approach because the former approach can easily incorporate statistical mea- sures or directly plug sensitivity information into distance functions. In this paper, we designed three types of context-sensitive kernel functions: polynomial-based, radial basis function-based and negative distance-based kernels. From the experimental re- sults it becomes clear that the radial basis function-based kernel with information gain weighting gets the best accuracies and can always outperform their simple non-sensitive counterparts both in accuracy and in model complexity. And with well designed fea- tures and carefully chosen context sizes, our system can predict splice sites with fairly high accuracy, which can achieve the FP95% rate, 3.94 for donor sites and 5.98 for acceptor sites, an approximate state of the art performance for the moment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call