Novel mislabeled training data detection algorithm

Weiwei Yuan,Qi Zhu,Tinghuai Ma,Donghai Guan

doi:10.1007/s00521-016-2589-9

Abstract

As a kind of noise, mislabeled training data exist in many applications. Because of their negative effects on learning, many filter techniques have been proposed to identify and eliminate them. Ensemble learning-based filter (EnFilter) is the most widely used filter which employs ensemble classifiers. In EnFilter, first the noisy training dataset is divided into several subsets. Each noisy subset is then checked by the multiple classifiers which are trained based on other noisy subsets. It is noted that since the training data used to train multiple classifiers are noisy, the quality of these classifiers cannot be guaranteed, which might generate poor noise identification result. This problem is more serious when the noise ratio in the training dataset is high. To solve this problem, a straightforward but effective approach is proposed in this work. Instead of using noisy data to train the classifiers, nearly noise-free (NNF) data are used since they are supposed to train more reliable classifiers. To this end, a novel NNF data extraction approach is also proposed. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Novel mislabeled training data detection algorithm

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications

Lead the way for us

Journal: Neural Computing and Applications	Publication Date: Sep 16, 2016
Citations: 3

Similar Papers

Learning From Mislabeled Training Data Through Ambiguous Learning for In-Home Health Monitoring
Weiwei Yuan ... Guangjie Han
IEEE Journal on Selected Areas in Communications | VOL. 39
Weiwei Yuan, et. al.Weiwei Yuan ... Guangjie Han
04 Sep 2020
IEEE Journal on Selected Areas in Communications | VOL. 39

A method for adequate selection of training data sets to reconstruct seismic data using a convolutional U-Net
Jiho Park ... Soon Jee Seol
GEOPHYSICS | VOL. 86
Jiho Park, et. al.Jiho Park ... Soon Jee Seol
18 Aug 2021
GEOPHYSICS | VOL. 86

Bayesian classification using a noninformative prior and mislabeled training data
Robert S Lynch Jr ... Peter K Willett
Journal of the Franklin Institute | VOL. 336
Robert S Lynch Jr, et. al.Robert S Lynch Jr ... Peter K Willett
28 Jun 1999
Journal of the Franklin Institute | VOL. 336

Spatial prediction of landslide susceptibility using hybrid support vector regression (SVR) and the adaptive neuro-fuzzy inference system (ANFIS) with various metaheuristic algorithms
Mahdi Panahi ... Saro Lee
Science of The Total Environment | VOL. 741
Mahdi Panahi, et. al.Mahdi Panahi ... Saro Lee
07 Jun 2020
Science of The Total Environment | VOL. 741

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Novel mislabeled training data detection algorithm

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications