A classification model is calibrated if its predicted probabilities of outcomes reflect their accuracy. Calibrating neural networks is critical in medical analysis applications where clinical decisions rely upon the predicted probabilities. Most calibration procedures, such as temperature scaling, operate as a post processing step by using holdout validation data. In practice, it is difficult to collect medical image data with correct labels due to the complexity of the medical data and the considerable variability across experts. This study presents a network calibration procedure that is robust to label noise. We draw on the fact that the confusion matrix of the noisy labels can be expressed as the matrix product between the confusion matrix of the clean labels and the label noises. The method is based on estimating the noise level as part of a noise-robust training method. The noise level is then used to estimate the network accuracy required by the calibration procedure. We show that despite the unreliable labels, we can still achieve calibration results that are on a par with the results of a calibration procedure using data with reliable labels.
Read full abstract