Confident Learning: Estimating Uncertainty in Dataset Labels

Curtis Northcutt,Lu Jiang,Isaac Chuang

doi:10.1613/jair.1.12125

Curtis Northcutt, Lu Jiang + Show 1 more

Open Access

https://doi.org/10.1613/jair.1.12125

Copy DOI

Abstract

Learning exists in the context of data, yet notions of confidence typically focus on model predictions, not label quality. Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. Whereas numerous studies have developed these principles independently, here, we combine them, building on the assumption of a class-conditional noise process to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels. This results in a generalized CL which is provably consistent and experimentally performant. We present sufficient conditions where CL exactly finds label errors, and show CL performance exceeding seven recent competitive approaches for learning with noisy labels on the CIFAR dataset. Uniquely, the CL framework is not coupled to a specific data modality or model (e.g., we use CL to find several label errors in the presumed error-free MNIST dataset and improve sentiment classification on text data in Amazon Reviews). We also employ CL on ImageNet to quantify ontological class overlap (e.g., estimating 645 missile images are mislabeled as their parent class projectile), and moderately increase model accuracy (e.g., for ResNet) by cleaning data prior to training. These results are replicable using the open-source cleanlab release.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Artificial Intelligence Research	Publication Date: Apr 14, 2021
Citations: 311	License type: cc-by

R Discovery Prime

R Discovery Prime

Confident Learning: Estimating Uncertainty in Dataset Labels

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research

Lead the way for us

Similar Papers

Iterative Learning with Open-set Noisy Labels
Yisen Wang ... Hongyuan Zha
-
Yisen Wang, et. al.Yisen Wang ... Hongyuan Zha
01 Jun 2018
01 Jun 2018

Self-Augmentation Based on Noise-Robust Probabilistic Model for Noisy Labels
Byoung Woo Park ... Sung Woo Park
IEEE Access | VOL. 10
Byoung Woo Park, et. al.Byoung Woo Park ... Sung Woo Park
01 Jan 2021
IEEE Access | VOL. 10

Characterizing Label Errors: Confident Learning for Noisy-Labeled Image Segmentation
Minqing Zhang ... Zhen Li
-
Minqing Zhang, et. al.Minqing Zhang ... Zhen Li
01 Jan 2020
01 Jan 2020

Contrastive label correction for noisy label learning
Bin Huang ... Chaoyang Xu
Information Sciences | VOL. 611
Bin Huang, et. al.Bin Huang ... Chaoyang Xu
19 Aug 2022
Information Sciences | VOL. 611

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Confident Learning: Estimating Uncertainty in Dataset Labels

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research