Improving Image Captioning Systems With Postprocessing Strategies

Genc Hoxha,Giacomo Scuccato,Farid Melgani

doi:10.1109/tgrs.2023.3281334

Abstract

Image captioning (IC) systems are generally based on encoder–decoder architecture where convolutional neural networks (CNNs) are employed to represent an image with discriminative features and recurrent neural networks (RNNs) sequentially generate a sentence description. Even though a lot of effort has been devoted lately to designing reliable IC systems, the task is far from being solved. The generated descriptions can be affected by different errors related to the attributes and the objects present in the scene. Moreover, once an error occurs, it can be propagated in the recurrent layers of the decoder generating non-accurate descriptions. To solve this problem, we propose two postprocessing strategies applied to the generated descriptions to rectify the errors and improve their quality. The proposed postprocessing strategies are based on hidden Markov models (HMMs) and Viterbi algorithm. The proposed postprocessing strategies can be applied to any encoder–decoder IC system. They are applied at test time once the IC system is trained. In particular, we propose to rectify a sentence once it is fully generated (post-generation strategy) or at each time instant of the generation process (in-generation strategy). Experiments conducted on four different IC datasets confirm the promising capabilities of the proposed postprocessing strategies to rectify the output of a simple encoder–decoder by generating more coherent descriptions. The achieved results are competitive and sometimes better than complex IC systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Image Captioning Systems With Postprocessing Strategies

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Geoscience and Remote Sensing

Lead the way for us

Journal: IEEE Transactions on Geoscience and Remote Sensing	Publication Date: Jan 1, 2023
Citations: 5

Similar Papers

Development of Automated Image Caption Generator in Real-Time Application Using Pre-trained CNN Models
Alla Naga Venkata Nancharaiah ... Gunturu Kalpana
-
Alla Naga Venkata Nancharaiah, et. al.Alla Naga Venkata Nancharaiah ... Gunturu Kalpana
01 Jan 2021
01 Jan 2021

An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi
Santosh Kumar Mishra ... Mahesh Babu Peethala
-
Santosh Kumar Mishra, et. al.Santosh Kumar Mishra ... Mahesh Babu Peethala
17 Oct 2021
17 Oct 2021

A Novel SVM-Based Decoder for Remote Sensing Image Captioning
Genc Hoxha ... Farid Melgani
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Genc Hoxha, et. al.Genc Hoxha ... Farid Melgani
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
Santosh Kumar Mishra ... Sriparna Saha
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Santosh Kumar Mishra, et. al.Santosh Kumar Mishra ... Sriparna Saha
24 Mar 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Image Captioning Systems With Postprocessing Strategies

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Geoscience and Remote Sensing