A Density-Based Approach for Instance Selection

Joel Luis Carbonera,Mara Abel

doi:10.1109/ictai.2015.114

Abstract

Instance selection is an important preprocessing step that can be applied in many machine learning tasks. Due to the increasing of the size of the datasets, techniques for instance selection have been applied for reducing the data to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Besides that, algorithms of instance selection can also be applied for removing useless, erroneous or noisy instances, before applying learning algorithms. This step can improve the accuracy in classification problems. In the last years, several approaches for instance selection have been proposed. However, most of them have long runtimes and, due to this, they cannot be used for dealing with large datasets. In this paper, we propose a simple and effective density-based approach for instance selection. Our approach, called LDIS (local density-based instance selection), evaluates the instances of each class separately and keeps only the densest instances in a given (arbitrary) neighborhood. This ensures a reasonably low time complexity. Our approach was evaluated on 15 well-known data sets and its performance was compared with the performance of 5 state-of-the-art algorithms, considering three measures: accuracy, reduction and effectiveness. For evaluating the accuracy achieved using the datasets produced by the algorithms, we applied the KNN algorithm. The results show that LDIS achieves a performance (in terms of balance of accuracy and reduction) that is better or comparable to the performances of the other algorithms considered in the evaluation.

Full Text