Abstract

In the process of building machine learning models data sometimes must be sampled before the learning process can be applied. This step, known as instance selection, is mostly done to reduce the amount of data in a volume that will allow the computing resources required for the learning phase to be reduced. In addition, it also removes noisy data that can affect the learning quality. While the two objectives are often in conflict, in most current approaches, it is impossible to control the balance between them. We propose a reinforcement learning-based approach for instance selection, called curious instance selection (CIS), which evaluates clusters of instances using the curiosity loop architecture. The output of the algorithm is a matrix that represents the value of adding a cluster of instances to existing instances. This matrix enables the computation of the Pareto front and demonstrates the ability to balance the noise and volume reduction objectives. CIS was evaluated on five datasets, and its performance was compared with the performance of three state-of-the-art algorithms. Our results show that CIS not only provides enhanced flexibility but also achieves higher effectiveness (reduction times accuracy). This approach strengthens the appeal of using curiosity-based algorithms in data science.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.