Curious instance selection

Michal Moran,Tom Cohen,Yuval Ben-Zion,Goren Gordon

doi:10.1016/j.ins.2022.07.025

Abstract

In the process of building machine learning models data sometimes must be sampled before the learning process can be applied. This step, known as instance selection, is mostly done to reduce the amount of data in a volume that will allow the computing resources required for the learning phase to be reduced. In addition, it also removes noisy data that can affect the learning quality. While the two objectives are often in conflict, in most current approaches, it is impossible to control the balance between them. We propose a reinforcement learning-based approach for instance selection, called curious instance selection (CIS), which evaluates clusters of instances using the curiosity loop architecture. The output of the algorithm is a matrix that represents the value of adding a cluster of instances to existing instances. This matrix enables the computation of the Pareto front and demonstrates the ability to balance the noise and volume reduction objectives. CIS was evaluated on five datasets, and its performance was compared with the performance of three state-of-the-art algorithms. Our results show that CIS not only provides enhanced flexibility but also achieves higher effectiveness (reduction times accuracy). This approach strengthens the appeal of using curiosity-based algorithms in data science.

Full Text