Abstract
This paper considers the classification problem using support vector (SV) machines and investigates how to maximally reduce the size of the training set without losing information. Under separable data set assumptions, we derive the exact conditions stating which observations can be discarded without diminishing the overall information content. For this purpose, we introduce the concept of potential SVs, i.e., those data that can become SVs when future data become available. To complement this, we also characterize the set of discardable vectors (DVs), i.e., those data that, given the current data set, can never become SVs. Thus, these vectors are useless for future training purposes and can eventually be removed without loss of information. Then, we provide an efficient algorithm based on linear programming that returns the potential and DVs by constructing a simplex tableau. Finally, we compare it with alternative algorithms available in the literature on some synthetic data as well as on data sets from standard repositories.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have