Abstract

Conventional image classification methods mostly aim to classify a single object in an image in which an object often occupies a large area. However, images in social network services (SNS) are more complicated. They usually include multiple objects that have much information, such as people, environments, and actions. In this work, we aim at understanding images from SNS and classifying them to categories such as fashion, traveling, education, beauty, and animals. To improve the classification accuracy in such complicated scenario, in this paper, we propose a new framework for high-level image classification by synergizing the image captioning and the Natural Language Processing (NLP) model. First, we use an image captioning model to understand images, which generates text descriptions about the images. Second, we use a natural language processing model to classify the generated text descriptions from the images. In this way, we can classify the images according to the classification results of the generated text descriptions. Our framework includes two models; one is image captioning model, which we use a TensorFlow based visual attention model with the inception V3 model for pre-processing and extracting the image features. The other model is the NLP model, Bidirectional Encoder Representations from Transformers (BERT). We have built a labeled image dataset from Instagram, a popular SNS platform, to test our framework. Our results show that our proposed method has a promising performance in terms of classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call