Abstract

More and more data is being collected due to constant improvements in storage hardware and data collection techniques. The incoming flow of data is so much that data mining techniques cannot keep up with. The data collected often has redundant or irrelevant features/instances that limit classification performance. Feature selection and instance selection are processes that help reduce this problem by eliminating useless data. This paper develops a set of algorithms using Differential Evolution to achieve feature selection, instance selection, and combined feature and instance selection. The reduction of the data, the classification accuracy and the training time are compared with the original data and existing algorithms. Experiments on ten datasets of varying difficulty show that the newly developed algorithms can successfully reduce the size of the data, and maintain or increase the classification performance in most cases. In addition, the computational time is also substantially reduced. This work is the first time for systematically investigating a series of algorithms on feature and/or instance selection in classification and the findings show that instance selection is a much harder task to solve than feature selection, but with effective methods, it can significantly reduce the size of the data and provide great benefit.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call