Clothes image caption generation with attribute detection and visual attention model

Xianrui Li,Zhao Zhang,Mingbo Zhao,Zhiling Ye

doi:10.1016/j.patrec.2020.12.001

Abstract

Abstract Fashion is a multi-billion-dollar industry, which is directly related to social, cultural, and economic implications in the real world. While computer vision has demonstrated remarkable success in the applications of the fashion domain, natural language processing technology has become contributed in the area, so that it can build the connection between clothes image and human semantic understandings. An element work for combing images and language understanding is how to generate a natural language sentence that accurately summarizes the contents of a clothes image. In this paper, we develop a joint attribute detection and visual attention framework for clothes image captioning. Specifically, in order to involve more attributes of clothes to learn, we first utilize a pre-trained Convolutional Neural Network (CNN) to learn the feature that can characterize more information about clothing attribute. Based on such learned feature, we then adopt an encoder/decoder framework, where we first encoder the feature of clothes and then and input it to a language Long Short-Term Memory(LSTM) model for decoding the clothes descriptions. The method greatly enhances the performance of clothes image captioning and reduces the misleading attention. Extensive simulations based on real-world data verify the effectiveness of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clothes image caption generation with attribute detection and visual attention model

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Dec 10, 2020
Citations: 16

Similar Papers

Image Description Generator using Deep Learning
Deepak R Ksheerasagar
International Journal for Research in Applied Science and Engineering Technology | VOL. 10
Deepak R KsheerasagarDeepak R Ksheerasagar
31 Jul 2022
International Journal for Research in Applied Science and Engineering Technology | VOL. 10

Depth-aware salient object segmentation
Le Vu Ha ... Tran Hoang Tung
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Le Vu Ha, et. al.Le Vu Ha ... Tran Hoang Tung
07 Oct 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

Clothing retrieval with visual attention model
Zhonghao Wang ... Yujun Gu
-
Zhonghao Wang, et. al.Zhonghao Wang ... Yujun Gu
01 Dec 2017
01 Dec 2017

Chinese Image Caption Generation via Visual Attention and Topic Modeling.
Maofu Liu ... Lingjun Li
IEEE Transactions on Cybernetics | VOL. 52
Maofu Liu, et. al.Maofu Liu ... Lingjun Li
22 Jun 2020
IEEE Transactions on Cybernetics | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clothes image caption generation with attribute detection and visual attention model

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters