Abstract

The success of deep convolutional networks on image and text classification and recognition tasks depends on the availability of large, correctly labeled training datasets, but obtaining the correct labels for these gigantic datasets is very difficult task. To deal with this problem, we describe an approach for learning deep networks from datasets corrupted by unknown label noise. We append a nonlinear noise model to a standard deep network, which is learned in tandem with the parameters of the network. Further, we train the network using a loss function that encourages the clustering of training images. We argue that the non-linear noise model, while not rigorous as a probabilistic model, results in a more effective denoising operator during backpropagation. We evaluate the performance of proposed approach on image classification task with artificially injected label noise to MNIST, CIFAR-10, CIFAR-100 and ImageNet datasets and on a large-scale Clothing 1M dataset with inherent label noise. Further, we show that with the different initialization and the regularization of the noise model, we can apply this learning procedure to text classification tasks as well. We evaluate the performance of modified approach on TREC text classification dataset. On all these datasets, the proposed approach provides significantly improved classification performance over the state of the art and is robust to the amount of label noise and the training samples. This approach is computationally fast, completely parallelizable, and easily implemented with existing machine learning libraries.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.