An Efficient Approach for Instance Selection

Joel Luís Carbonera

doi:10.1007/978-3-319-64283-3_17

Abstract

Nowadays, the volume of data that is produced challenges our capabilities of converting it in useful knowledge. Due to this, data mining approaches have been applied for extracting useful knowledge from this big data. In order to deal with the increasing size of datasets, techniques for instance selection have been applied for reducing the data to a manageable volume and, consequently, to reduce the computational resources that are necessary to apply data mining approaches. However, most of the proposed approaches for instance selection have a high time complexity and, due to this, they cannot be applied for dealing with big data. In this paper, we propose a novel approach for instance selection called XLDIS. This approach adopts the notion of local density for selecting the most representative instances of each class of the dataset, providing a reasonably low time complexity. The approach was evaluated on 20 well-known datasets used in a classification task, and its performance was compared to those of 6 state-of-the-art algorithms, considering three measures: accuracy, reduction, and effectiveness. All the obtained results show that, in general, the XLDIS algorithm provides the best trade-off between accuracy and reduction.

Full Text