An Integrated Instance‐Based Learning Algorithm

D Randall Wilson,Tony R Martinez

doi:10.1111/0824-7935.00103

Abstract

The basic nearest‐neighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This paper proposes methods for overcoming each of these weaknesses and combines the methods into a comprehensive learning system called the Integrated Decremental Instance‐Based Learning Algorithm (IDIBL) that seeks to reduce storage, improve execution speed, and increase generalization accuracy, when compared to the basic nearest neighbor algorithm and other learning models. IDIBL tunes its own parameters using a new measure of fitness that combines confidence and cross‐validation accuracy in order to avoid discretization problems with more traditional leave‐one‐out cross‐validation. In our experiments IDIBL achieves higher generalization accuracy than other less comprehensive instance‐based learning algorithms, while requiring less than one‐fourth the storage of the nearest neighbor algorithm and improving execution speed by a corresponding factor. In experiments on twenty‐one data sets, IDIBL also achieves higher generalization accuracy than that reported for sixteen major machine learning and neural network models.

Full Text