Abstract

Nowadays, the volume of data that is produced challenges our capabilities of converting it in useful knowledge. Due to this, data mining approaches have been applied for extracting useful knowledge from this big data. In order to deal with the increasing size of datasets, techniques for instance selection have been applied for reducing the data to a manageable volume and, consequently, to reduce the computational resources that are necessary to apply data mining approaches. However, most of the proposed approaches for instance selection have a high time complexity and, due to this, they cannot be applied for dealing with big data. In this paper, we propose a novel approach for instance selection called XLDIS. This approach adopts the notion of local density for selecting the most representative instances of each class of the dataset, providing a reasonably low time complexity. The approach was evaluated on 20 well-known datasets used in a classification task, and its performance was compared to those of 6 state-of-the-art algorithms, considering three measures: accuracy, reduction, and effectiveness. All the obtained results show that, in general, the XLDIS algorithm provides the best trade-off between accuracy and reduction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.