Revisiting Hard Negative Mining in Contrastive Learning for Visual Understanding

Hao Zhang,Chunyan Feng,Xin Wang,Jiahui Yang,Zheng Li,Caili Guo

doi:10.3390/electronics12234884

Hao Zhang, Chunyan Feng + Show 4 more

Open Access

https://doi.org/10.3390/electronics12234884

Copy DOI

Abstract

Efficiently mining and distinguishing hard negatives is the key to Contrastive Learning (CL) in various visual understanding tasks. By properly emphasizing the penalty of hard negatives, Hard Negative Mining (HNM) can improve the CL performance. However, there is no method to quantitatively analyze the penalty strength of hard negatives, which makes training difficult to converge. In this paper, we propose a method for measuring and controlling the penalty strength. We first define a penalty strength metric to provides a quantitative analysis tool for HNM. Then, we propose a Triplet loss with Penalty Strength Control (T-PSC), which can balance the penalty strength of hard negatives and the difficulty of model optimization. In order to verify the effectiveness of the proposed T-PSC method in different modalities, we applied it to two visual understanding tasks: Image–Text Retrieval (ITR) for multi-model processing, and Temporal Action Localization (TAL) for video processing. T-PSC can be applied to existing ITR and TAL models in a plug-and-play manner without any changes. Experiments combined with existing models show that a reasonable control of the penalty strength can speed up training and improve the performance on higher-level tasks.

Full Text