Abstract

In real applications, label noise and feature noise are two main noise sources. Similar to feature noise, label noise imposes great detriment on training classification models. Motivated by successful application of deep learning method in normal classification problems, this paper proposes a new framework called LNC-SDAE to handle those datasets corrupted with label noise, or so-called inaccurate supervision problems. The LNC-SDAE framework contains a preliminary label noise cleansing part and a stacked denoising auto-encoder. In preliminary label noise cleansing part, the K-fold cross-validation thought is applied for detecting and relabeling those mislabeled samples. After being preprocessed by label noise cleansing part, the cleansed training dataset is then input into the stacked denoising auto-encoder to learn robust representation for classification. A corrupted UCI standard dataset and a corrupted real industrial dataset are used for test, both of which contain a certain proportion of label noise (the ratio changes from 0% to 30%). The experiment results prove the effectiveness of LNC-SDAE, the representation learnt by which is shown robust.

Highlights

  • In real applications, almost all supervised learning suffers from two types of noise, noise among feature variables and noise in label variables

  • Motivated by successful application of deep learning method in normal classification problems, this paper proposes a new framework called label noise cleansing (LNC)-stacked denoising autoencoder (SDAE) to handle those datasets corrupted with label noise, or socalled inaccurate supervision problems

  • With the initial label noise ratio increases from 10% to 30%, the average classification accuracy gap between SDAE trained with corrupted dataset and SDAE trained with original dataset becomes larger

Read more

Summary

Introduction

Almost all supervised learning suffers from two types of noise, noise among feature variables (process variables) and noise in label variables. The first type is to add a label noise filter module beforehand to detect those most probable noisy samples, for example, the nearest neighbor criterion [11, 12] and cumulative information criterion [13] Those data points identified as mislabeled samples or so-called outliers will be removed before feeding to train the latter classifier model. Paper [28] proves that an effective preprocessing of samples with corrupted labels will effectively improve the performance of traditional supervised algorithms in inaccurate supervision problems Inspired by these two aspects, this paper puts forward a framework combining a label noise cleansing part and a deep learning algorithm to solve inaccurate supervision problems.

Background
The Proposed Algorithm
Case Study
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.