Do We Train on Test Data? Purging CIFAR of Near-Duplicates.

Björn Barz,Joachim Denzler

doi:10.3390/jimaging6060041

Abstract

The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the “fair CIFAR” (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. The training set remains unchanged, in order not to invalidate pre-trained models. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. We make both the ciFAIR dataset and pre-trained models publicly available and furthermore maintain a leaderboard for tracking the state of the art.

Highlights

Almost ten years after the first instantiation of the ImageNet Large Scale Visual RecognitionChallenge (ILSVRC) [1], image classification is still a very active field of research
We find that 3.3% of CIFAR-10 test images and a startling number of 10% of CIFAR-100 test images have near-duplicates in their respective training sets
We found 891 duplicates from the CIFAR-100 test set in the training set and 104 duplicates within the test set itself

Summary

Introduction

Almost ten years after the first instantiation of the ImageNet Large Scale Visual Recognition. With a growing number of duplicates, we run the risk of comparing them in terms of their capability of memorizing the training data, which increases with model capacity This is especially problematic when the difference between the error. Imaging 2020, 6, 41 rates of different models is as small as it is nowadays, i.e., sometimes just one or two percent points The significance of these performance differences, depends on the overlap between test and training data. We assess the number of test images that have near-duplicates in the training set of two of the most heavily benchmarked datasets in computer vision: CIFAR-10 and CIFAR-100 [12]. Recht et al [18] have recently sampled a completely new test set for CIFAR-10 from Tiny Images to assess how well existing models generalize to truly unseen data. We make the list of duplicates found, the new test sets, and pre-trained models publicly available at https://cvjena.github.io/cifair/

The CIFAR Datasets

Mining Duplicate Candidates

Manual Annotation

Duplicate Statistics

The Duplicate-Free ciFAIR Test Dataset

Re-Evaluation of the State of the Art

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of imaging	Publication Date: Jun 2, 2020
Citations: 42	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Do We Train on Test Data? Purging CIFAR of Near-Duplicates.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of imaging

Lead the way for us

Similar Papers

Signal quality in cardiorespiratory monitoring
Gari D Clifford ... George B Moody
Physiological Measurement | VOL. 33
Gari D Clifford, et. al.Gari D Clifford ... George B Moody
17 Aug 2012
Physiological Measurement | VOL. 33

Evasion Generative Adversarial Network for Low Data Regimes
Rizwan Hamid Randhawa ... Mohammad Alauthman
IEEE transactions on artificial intelligence | VOL. 4
Rizwan Hamid Randhawa, et. al.Rizwan Hamid Randhawa ... Mohammad Alauthman
01 Oct 2023
IEEE transactions on artificial intelligence | VOL. 4

Prediction of evoked expression from videos with temporal position fusion
Van Thong Huynh ... Soo-Hyung Kim
Pattern recognition letters | VOL. 172
Van Thong Huynh, et. al.Van Thong Huynh ... Soo-Hyung Kim
10 Jul 2023
Pattern recognition letters | VOL. 172

Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Damien Teney ... Ehsan Abbasnedjad
-
Damien Teney, et. al.Damien Teney ... Ehsan Abbasnedjad
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Do We Train on Test Data? Purging CIFAR of Near-Duplicates.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of imaging