Assessing operational accuracy of CNN-based image classifiers using an oracle surrogate

Antonio Guerriero,Michael R Lyu,Roberto Pietrantuono,Stefano Russo

doi:10.1016/j.iswa.2022.200172

Abstract

ContextAssessing the accuracy in operation of a Machine Learning (ML) system for image classification on arbitrary (unlabeled) inputs is hard. This is due to the oracle problem, which impacts the ability of automatically judging the output of the classification, thus hindering the accuracy of the assessment when unlabeled previously unseen inputs are submitted to the system. ObjectiveWe propose the Image Classification Oracle Surrogate (ICOS), a technique to automatically evaluate the accuracy in operation of image classifiers based on Convolutional Neural Networks (CNNs). MethodTo establish whether the classification of an arbitrary image is correct or not, ICOS leverages three knowledge sources: operational input data, training data, and the ML algorithm. Knowledge is expressed through likely invariants - properties which should not be violated by correct classifications. ICOS infers and filters invariants to improve the correct detection of misclassifications, reducing the number of false positives. We evaluate ICOS experimentally on twelve CNNs – using the popular MNIST, CIFAR10, CIFAR100, and ImageNet datasets. We compare it to two alternative strategies, namely cross-referencing and self-checking. ResultsExperimental results show that ICOS exhibits performance comparable to the other strategies in terms of accuracy, showing higher stability over a variety of CNNs and datasets with different complexity and size. ConclusionsICOS likely invariants are shown to be effective in automatically detecting misclassifications by CNNs used in image classification tasks when the expected output is unknown; ICOS ultimately yields faithful assessments of their accuracy in operation. Knowledge about input data can also be manually incorporated into ICOS, to increase robustness against unexpected phenomena in operation, like label shift.

Full Text