Inaccurate Labels in Weakly-Supervised Deep Learning: Automatic Identification and Correction and Their Impact on Classification Performance.

Degan Hao,Shandong Wu,Aly Mohamed,Jules Sumkin,Lei Zhang

doi:10.1109/jbhi.2020.2974425

Degan Hao, Shandong Wu + Show 3 more

Open Access

https://doi.org/10.1109/jbhi.2020.2974425

Copy DOI

Abstract

In data-driven deep learning-based modeling, data quality may substantially influence classification performance. Correct data labeling for deep learning modeling is critical. In weakly-supervised learning, a challenge lies in dealing with potentially inaccurate or mislabeled training data. In this paper, we proposed an automated methodological framework to identify mislabeled data using two metric functions, namely, Cross-entropy Loss that indicates divergence between a prediction and ground truth, and Influence function that reflects the dependence of a model on data. After correcting the identified mislabels, we measured their impact on the classification performance. We also compared the mislabeling effects in three experiments on two different real-world clinical questions. A total of 10,500 images were studied in the contexts of clinical breast density category classification and breast cancer malignancy diagnosis. We used intentionally flipped labels as mislabels to evaluate the proposed method at a varying proportion of mislabeled data included in model training. We also compared the effects of our method to two published schemes for breast density category classification. Experiment results show that when the dataset contains 10% of mislabeled data, our method can automatically identify up to 98% of these mislabeled data by examining/checking the top 30% of the full dataset. Furthermore, we show that correcting the identified mislabels leads to an improvement in the classification performance. Our method provides a feasible solution for weakly-supervised deep learning modeling in dealing with inaccurate labels.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Biomedical and Health Informatics	Publication Date: Feb 17, 2020
Citations: 75	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Inaccurate Labels in Weakly-Supervised Deep Learning: Automatic Identification and Correction and Their Impact on Classification Performance.

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Biomedical and Health Informatics

Lead the way for us

Similar Papers

Abstract 184: The utility of deep metric learning for breast cancer identification on mammographic images
Justin Du ... Enoch Chang
Cancer Research | VOL. 81
Justin Du, et. al.Justin Du ... Enoch Chang
01 Jul 2021
Cancer Research | VOL. 81

Combining radiomics and deep learning features of intra-tumoral and peri-tumoral regions for the classification of breast cancer lung metastasis and primary lung cancer with low-dose CT.
Lei Li ... Jian Zheng
Journal of cancer research and clinical oncology | VOL. 149
Lei Li, et. al.Lei Li ... Jian Zheng
29 Aug 2023
Journal of cancer research and clinical oncology | VOL. 149

Combining Deep Learning and Handcrafted Radiomics for Classification of Suspicious Lesions on Contrast-enhanced Mammograms.
Manon P L Beuque ... Henry C Woodruff
Radiology | VOL. 307
Manon P L Beuque, et. al.Manon P L Beuque ... Henry C Woodruff
01 Jun 2023
Radiology | VOL. 307

Improving Deep Learning hydrological time series modeling using Gaussian Filter preprocessing
Rahim Barzegar ... Jan Adamowski
-
Rahim Barzegar, et. al.Rahim Barzegar ... Jan Adamowski
03 Mar 2021
03 Mar 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inaccurate Labels in Weakly-Supervised Deep Learning: Automatic Identification and Correction and Their Impact on Classification Performance.

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Biomedical and Health Informatics