Temporal-Difference Learning With Sampling Baseline for Image Captioning

Hui Chen,Guiguang Ding,Sicheng Zhao,Jungong Han

doi:10.1609/aaai.v32i1.12263

Abstract

The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentiable evaluation metric. In this paper, we utilize reinforcement learning method to train the image captioning model. Specifically, we train our image captioning model to maximize the overall reward of the sentences by adopting the temporal-difference (TD) learning method, which takes the correlation between temporally successive actions into account. In this way, we assign different values to different words in one sampled sentence by a discounted coefficient when back-propagating the gradient with the REINFORCE algorithm, enabling the correlation between actions to be learned. Besides, instead of estimating a "baseline" to normalize the rewards with another network, we utilize the reward of another Monte-Carlo sample as the "baseline" to avoid high variance. We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Temporal-Difference Learning With Sampling Baseline for Image Captioning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Apr 27, 2018
Citations: 22

Similar Papers

Reinforcement Learning for Clinical Applications.
Kia Khezeli ... Benjamin Shickel
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18
Kia Khezeli, et. al.Kia Khezeli ... Benjamin Shickel
08 Feb 2023
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18

Reinforcement Learning Using Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution
Yuko Osana
-
Yuko OsanaYuko Osana
14 Jan 2011
14 Jan 2011

Exploring Supervised and Unsupervised Rewards in Machine Translation
Julia Ive ... Zixu Wang
-
Julia Ive, et. al.Julia Ive ... Zixu Wang
01 Jan 2020
01 Jan 2020

Similarities Between Policy Gradient Methods (PGM) in Reinforcement Learning (RL) and Supervised Learning (SL)
Eric Benhamou
SSRN | VOL. -
Eric BenhamouEric Benhamou
01 Jan 2019
SSRN | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal-Difference Learning With Sampling Baseline for Image Captioning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence