Abstract

Label noise is a common phenomenon when labeling a large-scale dataset for supervised learning. Outlier detection is a recently proposed method to handle this issue by treating the outliers of each class as potential data points with label noise and remove them before training. However, this approach could lead to high false positive rate and hurt the performance. In this paper, we propose a novel and effective method to deal with this issue by combining the strength of outlier detection and reconstruction error minimization (REM). The main idea is add a second verification step (i.e., REM) to the outputs of outlier detection so as to reduce the risk of discarding those points which do not fit the underlying data distribution well but with correct label. Particularly, we first find the outliers in each class by a robust deep autoencoders-based outlier detector, through which not only did we get candidate mislabeled data but also a group of well-learned deep autoencoders. Then a reconstruction error minimization based approach is applied to these outliers to further filter and relabel the mislabeled data. The experimental results on MNIST dataset show that the proposed method could significantly reduce the false positive rate of outlier detection and improve the performance of both data cleaning and classification in the presence of label noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call