Multiple Relational Learning Network for Joint Referring Expression Comprehension and Segmentation

Guoguang Hua,Wenbin Zou,Yuhang Zhang,Muxin Liao,Shishun Tian

doi:10.1109/tmm.2023.3241802

Abstract

Multi-task learning is a successful learning framework which improves the performance of prediction models by leveraging knowledge among related tasks. Referring expression comprehension (REC) and segmentation (RES) are highly relevant tasks, which both are language-guided visual recognition tasks. However, their relations have not yet been fully exploited in previous works. In this paper, a Multiple Relational Learning Network (MRLN) is proposed for multi-task learning of REC and RES. First, a feature-feature interaction learning module is introduced to handle the complicated interactions among features. Moreover, we propose a feature-task dependence learning module, which associates the related features with target tasks. Furthermore, a task-task relationship learning module is designed, which captures the relationships among tasks automatically and guides the REC and RES fine-tuning adaptively. To verify our proposed approach, experiments are conducted on three benchmark datasets, i.e., RefCOCO, RefCOCO+, and RefCOCOg. Extensive experiments demonstrate that the multiple relationships are more appealing since it alleviates the prediction inconsistency issue in multi-task setup. In addition, the experimental results report the significant performance gains of MRLN over most existing methods, i.e., up to 83.46 % for REC and 63.62 % for RES over state-of-the-art methods, which demonstrate the validity and superiority of MRLN.

Full Text