Abstract

Cross-modal image-text retrieval (CMITR) has been a high-value research topic for more than a decade. In most of the previous studies, the data for all tasks are trained as a single set. However, in reality, a more likely scenario is that the dataset has multiple tasks and trains them in sequence. The consequence is the limited ability to memorize the old task once a new task arrives; in other words, catastrophic forgetting. To solve this issue, this paper proposes a novel continual learning for cross-modal image-text retrieval (CLCMR) method to alleviate catastrophic forgetting. We construct a multilayer domain-selective attention (MDSA) based network to obtain knowledge from task-relevant and domain-specific attention levels. Moreover, a memory factor has been designed to achieve weight regularization, and a novel memory loss function is utilized to constrain MDSA. The extensive experimental results from multiple datasets (Wikipedia, Pascal Sentence, and PKU XMedianet datasets) demonstrate that CLCMR can effectively alleviate catastrophic forgetting and achieve a superior continual learning ability compared with the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call