Abstract

Conventional image classification methods mostly aim to classify a single object in an image in which an object often occupies a large area. However, images in social network services (SNS) are more complicated. They usually include multiple objects that have much information, such as people, environments, and actions. In this work, we aim at understanding images from SNS and classifying them to categories such as fashion, traveling, education, beauty, and animals. To improve the classification accuracy in such complicated scenario, in this paper, we propose a new framework for high-level image classification by synergizing the image captioning and the Natural Language Processing (NLP) model. First, we use an image captioning model to understand images, which generates text descriptions about the images. Second, we use a natural language processing model to classify the generated text descriptions from the images. In this way, we can classify the images according to the classification results of the generated text descriptions. Our framework includes two models; one is image captioning model, which we use a TensorFlow based visual attention model with the inception V3 model for pre-processing and extracting the image features. The other model is the NLP model, Bidirectional Encoder Representations from Transformers (BERT). We have built a labeled image dataset from Instagram, a popular SNS platform, to test our framework. Our results show that our proposed method has a promising performance in terms of classification accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.