Abstract

Finding general evaluation metrics for unsupervised representation learning techniques is a challenging open research question, which recently has become more and more necessary due to the increasing interest in unsupervised methods. Even though these methods promise beneficial representation characteristics, most approaches currently suffer from the objective function mismatch. This mismatch states that the performance on a desired target task can decrease when the unsupervised pretext task is learned too long–especially when both tasks are ill-posed. In this work, we build upon the widely used linear evaluation protocol and define new general evaluation metrics to quantitatively capture the objective function mismatch and the more generic metrics mismatch. We discuss the usability and stability of our protocols on a variety of pretext and target tasks and study mismatches in a wide range of experiments. Thereby we disclose dependencies of the objective function mismatch across several pretext and target tasks with respect to the pretext model’s representation size, target model complexity, pretext and target augmentations as well as pretext and target task types. In our experiments, we find that the objective function mismatch reduces performance by sim0.1–5.0% for Cifar10, Cifar100 and PCam in many setups, and up to sim25–59% in extreme cases for the 3dshapes dataset.

Highlights

  • Unsupervised Representation Learning is a promising approach to learn useful features from huge amounts of data without human annotation effort

  • Thereby we disclose dependencies of the objective function mismatch across several pretext and target tasks with respect to the pretext model’s representation size, target model complexity, pretext and target augmentations as well as pretext and target task types

  • – We propose hard and soft versions of general metrics to measure and compare mismatches of representation learning methods across different target tasks (Sect. 3 and 4)

Read more

Summary

Introduction

Unsupervised Representation Learning is a promising approach to learn useful features from huge amounts of data without human annotation effort. Thereby, a common evaluation pattern is to train an unsupervised pretext model on different datasets and test its performance on several target tasks. Because of the huge variety of target tasks and preferred representation characteristics, the evaluation of these methods is challenging. – We propose hard and soft versions of general metrics to measure and compare mismatches of (unsupervised) representation learning methods across different target tasks – We discuss the usability and stability of our protocols on a variety of pretext and target tasks – In our experiments we qualitatively show dependencies of the objective function mismatch with respect to the pretext model’s representation size – We find that the objective function mismatch can reduce performance on various benchmarks. We observe a performance decrease by $ 0.1– 5.0% for Cifar, Cifar100 and PCam, and up to $ 25–59% in extreme cases for the 3dshapes dataset (Sect. 6)

Unsupervised representation learning
Analyzing unsupervised representation learning
Hard metrics mismatch
Hard objective function mismatch
Soft metrics mismatch
Soft objective function mismatch
Experimental setup
Evaluation
Mismatch and convergence
Stability
Dependence on representation size
Dependence on target model complexity
Dependence on augmentations
Dependence on target task type
Applying our metrics to ResNet models
Future work
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call