AutoEval: Are Labels Always Necessary for Classifier Accuracy Evaluation?

Weijian Deng,Liang Zheng

doi:10.1109/tpami.2021.3136244

Weijian Deng, Liang Zheng

Open Access

https://doi.org/10.1109/tpami.2021.3136244

Copy DOI

Journal: IEEE transactions on pattern analysis and machine intelligence	Publication Date: Mar 1, 2024
Citations: 9	License type: publisher-specific, author manuscript

Abstract

Understanding model decision under novel test scenarios is central to the community. A common practice is evaluating models on labeled test sets. However, many real-world scenarios see unlabeled test data, rendering the common supervised evaluation protocols infeasible. In this paper, we investigate such an important but under-explored problem, named Automatic model Evaluation (AutoEval). Specifically, given a trained classifier, we aim to estimate its accuracy on various unlabeled test datasets. We construct a meta-dataset: a dataset comprised of datasets (sample sets) created from original images via various transformations such as rotation and background substitution. Correlation studies on the meta-dataset show that classifier accuracy exhibits a strong negative linear relationship with distribution shift (Pearson's Correlation ). This new finding inspires us to formulate AutoEval as a dataset-level regression problem. Specifically, we learn regression models (e.g., a regression neural network) to estimate classifier accuracy from overall feature statistics of a test set. In the experiment, we show that the meta-dataset contains sufficient and diverse sample sets, allowing us to train robust regression models and report reasonable and promising predictions of the classifier accuracy on various test sets. We also provide insights into application scopes, limitations, and potential future directions of AutoEval.

Full Text