Abstract

Accurate alternative splice site (ASS) recognition is an important and difficult topic in the gene identification, and the average recognition rate is still ,85% [1]. Many statistical pattern recognition methods, such as neural networks (NNs) and support vector machine (SVM), were used for this task [2,3]. Among them, SVM can construct a good highdimensional learning model in the case of limited training set size and has good generalization ability, which exhibits many unique advantages in solving the small sample, non-linear, and high-dimensional pattern recognition problems [4]. To reduce the impact of noise samples on constructing optimal hyperplane, fuzzy SVM (FSVM) method was proposed [5]. Each sample was assigned a different membership and had different contributions to the objective function. Because the noise samples had smaller memberships, and their effects on the separating hyperplane were reduced or eliminated. The fuzzy membership function (FMF) design is critical for FSVM [6,7]. A good FMF should be able to assign support vectors higher membership while noise samples lower membership. FMF is generally constructed by the distances between samples and class centers [8], tightness defined by mixed kernel function [9], or tightness defined by mix kernel function in feature space [10]. These methods can reduce to some extent the impact of the noise samples, but also reduce the memberships of support vectors. Here we designed a new membership calculation method that can simultaneously reduce the noise sample memberships and increase the support vector memberships. Given a fuzzy training samples set: S 1⁄4 fðx1; y1; s1Þ; ðx2; y2; s2Þ; ...; ðxn; yn; snÞg, where xi is sample vector, yi [ f 1; 1g is the sample category label, 0 si 1 is the sample fuzzy membership that reflects the importance of xi. The FSVM decision function is

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call