High-level Image Classification by Synergizing Image Captioning with BERT

Xiaohong Yu,Yoseop Ahn,Jaehoon Jeong

doi:10.1109/ictc52510.2021.9620954

Abstract

Conventional image classification methods mostly aim to classify a single object in an image in which an object often occupies a large area. However, images in social network services (SNS) are more complicated. They usually include multiple objects that have much information, such as people, environments, and actions. In this work, we aim at understanding images from SNS and classifying them to categories such as fashion, traveling, education, beauty, and animals. To improve the classification accuracy in such complicated scenario, in this paper, we propose a new framework for high-level image classification by synergizing the image captioning and the Natural Language Processing (NLP) model. First, we use an image captioning model to understand images, which generates text descriptions about the images. Second, we use a natural language processing model to classify the generated text descriptions from the images. In this way, we can classify the images according to the classification results of the generated text descriptions. Our framework includes two models; one is image captioning model, which we use a TensorFlow based visual attention model with the inception V3 model for pre-processing and extracting the image features. The other model is the NLP model, Bidirectional Encoder Representations from Transformers (BERT). We have built a labeled image dataset from Instagram, a popular SNS platform, to test our framework. Our results show that our proposed method has a promising performance in terms of classification accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

High-level Image Classification by Synergizing Image Captioning with BERT

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Comparative study of sequence-to-sequence models: From RNNs to transformers
Jiancong Zhu
Applied and Computational Engineering | VOL. 42
Jiancong ZhuJiancong Zhu
23 Feb 2024
Applied and Computational Engineering | VOL. 42

A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.
Maarten Homburg ... Jean Muris
Journal of Medical Internet Research | VOL. 25
Maarten Homburg, et. al.Maarten Homburg ... Jean Muris
04 Oct 2023
Journal of Medical Internet Research | VOL. 25

Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy
Carolyn E Schwartz ... Elijah Biletch
Journal of Methods and Measurement in the Social Sciences | VOL. 13
Carolyn E Schwartz, et. al.Carolyn E Schwartz ... Elijah Biletch
01 Oct 2022
Journal of Methods and Measurement in the Social Sciences | VOL. 13

ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations
Semmy Wellem Taju ... Yu-Yen Ou
Computational Biology and Chemistry | VOL. 93
Semmy Wellem Taju, et. al.Semmy Wellem Taju ... Yu-Yen Ou
29 Jun 2021
Computational Biology and Chemistry | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-level Image Classification by Synergizing Image Captioning with BERT

Abstract

Talk to us

Similar Papers