A Study on Visual Understanding Image Captioning using Different Word Embeddings and CNN-Based Feature Extractions

Dhomas Hatta Fudholi,Royan Abida N Nayoan,Annisa Zahra

doi:10.22219/kinetik.v7i1.1394

Abstract

Image captioning is a task that can provide a description of an image in natural language. Image captioning can be used for a variety of applications, such as image indexing and virtual assistants. In this research, we compared the performance of three different word embeddings, namely, GloVe, Word2Vec, FastText and six CNN-based feature extraction architectures such as, Inception V3, InceptionResNet V2, ResNet152 V2, EfficientNet B3 V1, EfficientNet B7 V1, and NASNetLarge which then will be combined with LSTM as the decoder to perform image captioning. We used ten different household objects (bed, cell phone, chair, couch, oven, potted plant, refrigerator, sink, table, and tv) that were obtained from MSCOCO dataset to develop the model. Then, we created five new captions in Bahasa Indonesia for the selected images. The captions might contain details about the name, the location, the color, the size, and the characteristics of an object and its surrounding area. In our 18 experimental models, we used different combination of the word embedding and CNN-based feature extraction architecture, along with LSTM to train the model. As the result, models that used the combination of Word2Vec + NASNetLarge performed better in generating Indonesian captions than the other models based on BLEU-4 metric.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control	Publication Date: Feb 28, 2022
Citations: 2	License type: CC BY-NC-SA 4.0

R Discovery Prime

R Discovery Prime

A Study on Visual Understanding Image Captioning using Different Word Embeddings and CNN-Based Feature Extractions

Abstract

Talk to us

Similar Papers

More From: Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control

Lead the way for us

Similar Papers

Red Deer Optimization with Artificial Intelligence Enabled Image Captioning System for Visually Impaired People
Fahd N Al-Wesabi ... Anwer Mustafa Hilal
Computer Systems Science and Engineering | VOL. 46
Fahd N Al-Wesabi, et. al.Fahd N Al-Wesabi ... Anwer Mustafa Hilal
01 Jan 2023
Computer Systems Science and Engineering | VOL. 46

A Novel Convolutional Neural Network-Gated Recurrent Unit approach for Image Captioning
Rahul Nijhawan ... Sarthak Singh Rawat
-
Rahul Nijhawan, et. al.Rahul Nijhawan ... Sarthak Singh Rawat
01 Aug 2020
01 Aug 2020

Caption Generation Based on Emotions Using CSPDenseNet and BiLSTM with Self-Attention
Pon Karthika K ... Nagalakshmi R
Applied Computational Intelligence and Soft Computing | VOL. 2022
Pon Karthika K, et. al.Pon Karthika K ... Nagalakshmi R
17 Sep 2022
Applied Computational Intelligence and Soft Computing | VOL. 2022

Automated Image Captioning with Multi-layer Gated Recurrent Unit
Ozge Taylan Moral ... Wenwu Wang
-
Ozge Taylan Moral, et. al.Ozge Taylan Moral ... Wenwu Wang
29 Aug 2022
29 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Study on Visual Understanding Image Captioning using Different Word Embeddings and CNN-Based Feature Extractions

Abstract

Talk to us

Similar Papers

More From: Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control