A Visually Enhanced Neural Encoder for Synset Induction

Guang Chen,Guangwei Zhang,Ruifan Li,Xiaoxu Li,Fangxiang Feng

doi:10.3390/electronics12163521

Abstract

The synset induction task is to automatically cluster semantically identical instances, which are often represented by texts and images. Previous works mainly consider textual parts, while ignoring the visual counterparts. However, how to effectively employ the visual information to enhance the semantic representation for the synset induction is challenging. In this paper, we propose a Visually Enhanced NeUral Encoder (i.e., VENUE) to learn a multimodal representation for the synset induction task. The key insight lies in how to construct multimodal representations through intra-modal and inter-modal interactions among images and text. Specifically, we first design the visual interaction module through the attention mechanism to capture the correlation among images. To obtain the multi-granularity textual representations, we fuse the pre-trained tags and word embeddings. Second, we design a masking module to filter out weakly relevant visual information. Third, we present a gating module to adaptively regulate the modalities’ contributions to semantics. A triplet loss is adopted to train the VENUE encoder for learning discriminative multimodal representations. Then, we perform clustering algorithms on the obtained representations to induce synsets. To verify our approach, we collect a multimodal dataset, i.e., MMAI-Synset, and conduct extensive experiments. The experimental results demonstrate that our method outperforms strong baselines on three groups of evaluation metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Visually Enhanced Neural Encoder for Synset Induction

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Aug 20, 2023
License type: CC BY 4.0

Similar Papers

Learning Multimodal Word Representations by Explicitly Embedding Syntactic and Phonetic Information
Wenhao Zhu ... Xiaoya Yin
IEEE Access | VOL. 8
Wenhao Zhu, et. al.Wenhao Zhu ... Xiaoya Yin
01 Jan 2020
IEEE Access | VOL. 8

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning
Jovi D’Silva ... Uzzal Sharma
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 12
Jovi D’Silva, et. al.Jovi D’Silva ... Uzzal Sharma
01 Apr 2022
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 12

A comparison of word embeddings for the biomedical natural language processing.
Yanshan Wang ... Feichen Shen
Journal of Biomedical Informatics | VOL. 87
Yanshan Wang, et. al.Yanshan Wang ... Feichen Shen
12 Sep 2018
Journal of Biomedical Informatics | VOL. 87

Dictionary-based Debiasing of Pre-trained Word Embeddings
Masahiro Kaneko ... Danushka Bollegala
-
Masahiro Kaneko, et. al.Masahiro Kaneko ... Danushka Bollegala
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Visually Enhanced Neural Encoder for Synset Induction

Abstract

Talk to us

Similar Papers

More From: Electronics