NPoSC-A3: A novel part of speech clues-aware adaptive attention mechanism for image captioning

Majjed Al-Qatf,Ammar Hawbani,Xingfu Wang,Amr Abdusallam,Liang Zhao,Saeed Hammod Alsamhi,Edward Curry

doi:10.1016/j.engappai.2023.107732

Abstract

Part of Speech (PoS) information has been broadly leveraged in previous image captioning methods to guide their decoder module to control whether the visual information is required for generating the target words. However, existing methods primarily focus on enhancing visual words (VWs) generation while neglecting non-visual words (NVWs) generation. So, in response, we introduce a novel PoS clues-aware adaptive attention mechanism (NPoSC-A3) to leverage the PoS clues to adaptively incorporate visual and semantic attention contexts into the language model, where the semantic information and the visual information are leveraged in generating the visual and non-visual words (VWs and NVWs). The mechanism of NPoSC-A3 comprises four key modules:global semantic context generator (GSCG), PoS context generator (PoSCG), PoS predictor (PoSP), and PoS clues-aware adaptive attention mechanism (PoSC-A3). GSCG generates a global semantic context that our model leverages for generating NVWs. PoSP predicts the PoS information of the word to be generated at the current time step. PoSC-A3 adaptively incorporates visual and global semantic features into the decoder module based on the PoS guidance. PoSCG constrains the visual context and global semantic context effect on the captioning process for generating more syntactic captions. Extensive experiments conducted using the MSCOCO standard dataset demonstrate that our presented method has raised the effectiveness of image captioning task and outperformed most recent and advanced image captioning works with evaluation metrics and attained 127.2 in CIDEr.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NPoSC-A3: A novel part of speech clues-aware adaptive attention mechanism for image captioning

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Journal: Engineering Applications of Artificial Intelligence	Publication Date: Jan 4, 2024
Citations: 1

Similar Papers

Author response: Massive cortical reorganization in sighted Braille readers
Łukasz Bola ... Artur Marchewka
-
Łukasz Bola, et. al.Łukasz Bola ... Artur Marchewka
29 Dec 2015
29 Dec 2015

Language Model Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech
...
-
, et. al. ...
21 Mar 2014
21 Mar 2014

N-gram adaptation using Dirichlet class language model based on part-of-speech for speech recognition
Ali Hatami ... Babak Nasersharif
-
Ali Hatami, et. al.Ali Hatami ... Babak Nasersharif
01 May 2013
01 May 2013

Factored language model adaptation using Dirichlet class language model for speech recognition
Ali Hatami ... Babak Nasersharif
-
Ali Hatami, et. al.Ali Hatami ... Babak Nasersharif
01 May 2013
01 May 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NPoSC-A3: A novel part of speech clues-aware adaptive attention mechanism for image captioning

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence