Abstract

The noise intolerance and the storage requirements of Nearest-Neighbor-based algorithms are the two main obstacles to their use for solving complex classification tasks. From the beginning of the 70's (with Hart and Gates), many methods have been proposed for dealing with these problems, by eliminating mislabeled instances, and selecting relevant prototypes. These models often have the distinctive feature of optimizing during the process the accuracy. In this paper, we present a new original approach which adapts the properties of boosting (which optimizes an other criterion) to the prototype selection field. While in a standard boosting algorithm the final classifier combines a set of weak hypotheses, where each one is a classifier built according to a given distribution over the training data, we defined in our approach each weak hypothesis as a single weighted prototype. The distribution update (a key step of boosting) and the criterion optimized during the process are slightly modified to allow an efficient adaptation of boosting to the prototype selection field. In order to show the interest of our new algorithm, called PSBOOST, we achieved a wide experimental study, comparing our procedure with the state-of-the-art prototype selection algorithms. Taking into account many performance measures, such as storage reduction, noise tolerance, generalization accuracy and learning speed, we can claim that PSBOOST seems to be very efficient by providing a good balance between all these performance measures. A statistical analysis is presented to validate all the results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call