Improving Instance Selection via Metric Learning

Eduardo Zarate Max,Ricardo Marcondes Marcacini,Edson Takashi Matsubara

doi:10.1109/ijcnn.2018.8489322

Abstract

The k-Nearest Neighbor (k-NN) rule is widely used for classification tasks because of its simplicity and efficiency. However, a well-known drawback of k-NN is its dependence on the quality of the training set, since the k-NN makes no assumption about the importance of each instance. In fact, the existence of noisy and superfluous instances in the training set tends to increase the classification error rate. Thus, instance selection methods are useful to identify which instances belonging to the training set will be considered in the k-NN classifier. Our proposal shows a simple and effective way to improve instance selection methods using metric learning. The idea of our proposal relies on a pure geometric intuition that metric learning transforms the input space where points in the same class are simultaneously near each other and far from points in the other classes. In a more “organised” space, we show that instance selection methods can benefit from this transformed space. We carried out an experimental evaluation to compare the instance selection with and without metric learning on UCI benchmark data sets. The results reveals that the combination of metric and instance selection is very welcome. All tested instance selection methods improved significantly.

Full Text