Abstract
Cross-modal matching is one of the most fundamental and widely studied tasks in the field of data science. To have a better understanding of the complicated cross-modal correspondences, the powerful attention mechanism has been widely used recently. In this paper, we propose a novel Dual Gated Attention Fusion (DGAF) unit to save cross-modal matching from heavy attention computation. Specifically, the attention unit in the main information flow is alternated to a single-head low-dimension light-weighted attention bypass which serves as a gate to selectively cast away noise in both modality. To strengthen the interaction between modalities, an auxiliary memory unit is appended. A gated memory fusion unit is designed to fuse the memorized inter-modality information into both modality streams. Extensive experiments on two benchmark datasets show that the proposed DGAF achieves good balance between the efficiency and the effectiveness.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have