On removing conflicts for machine learning

Sergio Ledesma,Mario-Alberto Ibarra-Manzano,Dora-Luz Almanza-Ojeda,Juan Gabriel Avina-Cervantes,Eduardo Cabal-Yepez

doi:10.1016/j.eswa.2022.117835

Sergio Ledesma, Mario-Alberto Ibarra-Manzano + Show 3 more

https://doi.org/10.1016/j.eswa.2022.117835

Copy DOI

Abstract

A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset.

Full Text