Abstract

Confidence calibration is a quickly growing area of research in deep learning, including computer vision applications. New model architectures and loss functions are being introduced to improve model calibration, an important topic for safety critical AI applications. In order to evaluate these new methods, they are frequently compared to simple baselines, which serve as a comparison and help measure new methods’ effectiveness. Popular baselines for evaluating model confidence calibration include label smoothing, mixup and dropout. Despite these methods frequent use and simple implementations, the parameter values that define them are rarely validated, and are usually a default value. This paper demonstrates the danger in using these default values for calibration benchmarks on common datasets; poor model calibration, specifically, a model that is purely over- or underconfident. We present an adaptive framework that can adjust the parameter value of these baseline methods during training based on validation accuracy and confidence to maintain good model calibration and balance over- and underconfidence. A disadvantage of binning-based calibration metrics is that they are not differentiable and therefore cannot be added to the training loss to penalize calibration error. Our adaptive framework solves this problem and provides a solution for optimizing calibration methods. Experiments with the CIFAR-10 and CIFAR-100 image classification datasets show that our approach improves model calibration compared to using these popular methods with default values and we achieve good model calibration regardless of model architecture and dataset. Further, our analysis provides a comparison between these baseline methods that can be considered for future confidence calibration research

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call