VieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

Doanh Bui Cao,Nguyen Duy Vo,Thuan Trong Nguyen,Truc Thi Thanh Trinh,Vu Duc Nguyen

doi:10.25073/2588-1086/vnucsce.371

Abstract

The automatic image caption generation is attractive to both Computer Vision and Natural Language Processing research community because it lies in the gap between these two fields. Within the vieCap4H contest organized by VLSP 2021, we participate and present a Transformer-based solution for image captioning in the healthcare domain. In detail, we use grid features as visual presentation and pre-training a BERT-based language model from PhoBERT-base pre-trained model to obtain language presentation used in the Adaptive Decoder module in the RSTNet model. Besides, we indicate a suitable schedule with the self-critical training sequence (SCST) technique to achieve the best results. Through experiments, we achieve an average of 30.3% BLEU score on the public-test round and 28.9% on the private-test round, which ranks 3rd and 4th, respectively. Source code is available at https://github.com/caodoanh2001/uit-vlsp-viecap4h-solution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

Abstract

Talk to us

Similar Papers

More From: VNU Journal of Science: Computer Science and Communication Engineering

Lead the way for us

Similar Papers

Recall What You See Continually Using GridLSTM in Image Captioning
Lingxiang Wu ... Jinqiao Wang
IEEE Transactions on Multimedia | VOL. 22
Lingxiang Wu, et. al.Lingxiang Wu ... Jinqiao Wang
13 Aug 2019
IEEE Transactions on Multimedia | VOL. 22

Exploring deep learning approaches for video captioning: A comprehensive review
Adel Jalal Yousif ... Mohammed H Al-Jammas
e-Prime - Advances in Electrical Engineering, Electronics and Energy | VOL. 6
Adel Jalal Yousif, et. al.Adel Jalal Yousif ... Mohammed H Al-Jammas
22 Nov 2023
e-Prime - Advances in Electrical Engineering, Electronics and Energy | VOL. 6

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang ... Li Deng
IEEE Journal of Selected Topics in Signal Processing | VOL. 14
Chao Zhang, et. al.Chao Zhang ... Li Deng
01 Mar 2020
IEEE Journal of Selected Topics in Signal Processing | VOL. 14

MBA: A Multimodal Bilinear Attention Model with Residual Connection for Abstractive Multimodal Summarization
Xia Ye ... Zengying Yue
Journal of Physics: Conference Series | VOL. 1856
Xia Ye, et. al.Xia Ye ... Zengying Yue
01 Apr 2021
Journal of Physics: Conference Series | VOL. 1856

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

Abstract

Talk to us

Similar Papers

More From: VNU Journal of Science: Computer Science and Communication Engineering