LCNME: Label Correction Using Network Prediction Based on Memorization Effects for Cross-Modal Retrieval With Noisy Labels

Daiki Okamura,Masahiro Iwahashi,Ryosuke Harakawa

doi:10.1109/tcsvt.2023.3286546

Abstract

Cross-modal retrieval with noisy labels has attracted much attention. This state-of-the-art method trains a network to increase weights for clean labels in the loss. However, we have found that the network is eventually overfitted to the remaining noisy labels as training progresses. Motivated by this finding, this paper proposes a method called Label Correction using Network prediction based on Memorization Effects (LCNME) to correct noisy labels. This is unlike the state-of-the-art method, which leaves noisy labels on training. We assume that noisy labels are irrelevant to data features and realize label correction using predicted labels (obtained by network prediction) instead of given labels. However, because of memorization effects (the property whereby the network first learns clean labeled data then learns noisy labeled data), predicted labels are contaminated by noisy labels from the certain epoch called the change epoch. Although the change epoch is unknown in advance, we find that it can be identified by observing the loss of the noisy validation set. Using the change epoch, predicted labels can be generated without being affected by noisy labels. Extensive experiments show that LCNME accurately corrects noisy labels and achieves better cross-modal retrieval than existing methods.

Full Text