Softmax Pooling for Super Visual Semantic Embedding

Zhixian Zeng,Jianjun Cao,Yizhuo Rao,Guoquan Jiang,Nianfeng Weng,Yuxin Xu

doi:10.1109/iemcon53756.2021.9623131

Abstract

The purpose of visual-semantic embedding is to respectively map image and text to a common embedding space and perform cross-modal semantic alignment learning. Image-text matching is also the main research content of visual semantic embedding. Existing researches have confirmed that in visual-semantic embedding, a simple pooling strategy can also achieve a good performance. However, the existing visual semantic pooling strategies (aggregators) generally have some problems, including adding additional training parameters, increasing training time, ignoring intra-modal semantic-related information, and so on. In this paper, we propose a Super Visual Semantic Embedding (SVSE) Model based on Softmax Pooling (SoftPool). We introduced the softmax pooling strategy into visual semantic embedding for the first time. SoftPool is not only simple to implement but also doesn't introduce new additional training parameters. It can adaptively calculate the weights between different feature values and preserve more intra-modal correlation information between different features. At the same time, we combine the enhanced semantic representation module and our softmax pooling strategy to construct the intra-modal semantic association, which is used to improve the performance of the visual semantic embedding in image-text matching. Undoubtedly, our proposed method possesses a higher engineering application value than other methods. Experiments are conducted on two widely used cross-modal image-text datasets, namely MS-COCO and Flickr-30K. Comparing with the best pooling strategy, our proposed softmax pooling strategy not only is better in training time but also outperforms by 0.48% (5K) on MS-COCO and 1.95% on Flickr-30K at R@1 (image retrieval). Moreover, comparing with the best visual semantic embedding model, our proposed SVSE outperforms by 2.83% (5K) on MS-COCO and 4.89% (1K) on Flickr-30K at R@1 (image retrieval), respectively. Our code is available at https://github.com/zengzhixian/SoftPool_SVSE.git.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Softmax Pooling for Super Visual Semantic Embedding

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Learning the Best Pooling Strategy for Visual Semantic Embedding
Jiacheng Chen ... Changhu Wang
-
Jiacheng Chen, et. al.Jiacheng Chen ... Changhu Wang
01 Jun 2021
01 Jun 2021

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching
Haoran Wang ... Ying Zhang
-
Haoran Wang, et. al.Haoran Wang ... Ying Zhang
01 Jan 2020
01 Jan 2020

MM-Stega: Multi-modal Steganography Based on Text-Image Matching
Yuting Hu ... Haoyun Li
-
Yuting Hu, et. al.Yuting Hu ... Haoyun Li
01 Jan 2020
01 Jan 2020

Dissecting Deep Metric Learning Losses for Image-Text Retrieval
Hong Xuan ... Xi Stephen Chen
-
Hong Xuan, et. al.Hong Xuan ... Xi Stephen Chen
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Softmax Pooling for Super Visual Semantic Embedding

Abstract

Talk to us

Similar Papers