Target-Oriented Deformation of Visual-Semantic Embedding Space

Takashi Matsubara

doi:10.1587/transinf.2020mup0003

Abstract

Multimodal embedding is a crucial research topic for cross-modal understanding, data mining, and translation. Many studies have attempted to extract representations from given entities and align them in a shared embedding space. However, because entities in different modalities exhibit different abstraction levels and modality-specific information, it is insufficient to embed related entities close to each other. In this study, we propose the Target-Oriented Deformation Network (TOD-Net), a novel module that continuously deforms the embedding space into a new space under a given condition, thereby adjusting similarities between entities. Unlike methods based on cross-modal attention, TOD-Net is a post-process applied to the embedding space learned by existing embedding systems and improves their performances of retrieval. In particular, when combined with cutting-edge models, TOD-Net gains the state-of-the-art cross-modal retrieval model associated with the MSCOCO dataset. Qualitative analysis reveals that TOD-Net successfully emphasizes entity-specific concepts and retrieves diverse targets via handling higher levels of diversity than existing models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEICE Transactions on Information and Systems	Publication Date: Jan 1, 2021
Citations: 5	License type: free

R Discovery Prime

R Discovery Prime

Target-Oriented Deformation of Visual-Semantic Embedding Space

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems

Lead the way for us

Similar Papers

Neural Retrieval with Partially Shared Embedding Spaces
Bo Li ... Le Jia
-
Bo Li, et. al.Bo Li ... Le Jia
17 Oct 2018
17 Oct 2018

Supervised contrastive learning over prototype-label embeddings for network intrusion detection
Manuel Lopez-Martin ... Belen Carro
Information Fusion | VOL. 79
Manuel Lopez-Martin, et. al.Manuel Lopez-Martin ... Belen Carro
20 Sep 2021
Information Fusion | VOL. 79

Simple to complex cross-modal learning to rank
Minnan Luo ... Qinghua Zheng
Computer Vision and Image Understanding | VOL. 163
Minnan Luo, et. al.Minnan Luo ... Qinghua Zheng
08 Jul 2017
Computer Vision and Image Understanding | VOL. 163

Deep multimodal embedding: Manipulating novel objects with point-clouds, language and trajectories
Jaeyong Sung ... Ian Lenz
-
Jaeyong Sung, et. al.Jaeyong Sung ... Ian Lenz
01 May 2017
01 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Target-Oriented Deformation of Visual-Semantic Embedding Space

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems