Abstract

A Machine Learning (ML) system learns from a set of samples called the training set. In some cases, the training set may have learning conflicts that affect the performance of the machine learning system. A learning conflict is produced when two or more samples in a dataset have similar input values but different target values. We propose a method to remove learning conflicts from a dataset in this work. Our method is based on a genetic algorithm tries to keep those samples that free of conflicts and intents to remove those samples with conflicts. Each individual in the genetic algorithm represents a possible dataset. We introduce the concept of retention error in the fitness function, which describes how many samples are kept while removing learning conflicts. Additionally, the fitness function comprises the Mean-Squared Error (MSE) that validates the machine learning performance. The algorithm is designed to keep as many samples as possible while the machine learning system exhibits the highest possible performance. Therefore, the proposal consists in cleaning first the dataset that compares and highlights the individual with the best performance in the Genetic Algorithm (GA), recommending which samples must be included for training and testing. Three different datasets with learning conflicts are used to test the proposed methodology. Besides, one artificial neural network is trained using the datasets with learning conflicts for each dataset. After removing the conflicts, a second artificial neural network is trained using the cleaned datasets. A noticeable reduction in the mean-square error is observed when the neural network is trained using the cleaned dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.