Abstract

Fashion image captioning is a critical task in the fashion industry that aims to automatically generate product descriptions for fashion items. However, existing fashion image captioning models predict a fixed caption for a particular fashion item once deployed, which does not cater to unique preferences. We explore a controllable way of fashion image captioning that allows the users to specify a few semantic attributes to guide the caption generation. Our approach utilizes semantic attributes as a control signal, giving users the ability to specify particular fashion attributes (e.g., stitch, knit, sleeve, etc.) and styles (e.g., cool, classic, fresh, etc.) that they want the model to incorporate when generating captions. By providing this level of customization, our approach creates more personalized and targeted captions that suit individual preferences. To evaluate the effectiveness of our proposed approach, we clean, filter, and assemble a new fashion image caption dataset called FACAD170K from the current FACAD dataset. This dataset facilitates learning and enables us to investigate the effectiveness of our approach. Our results demonstrate that our proposed approach outperforms existing fashion image captioning models as well as conventional captioning methods. Besides, we further validate the effectiveness of the proposed method on the MSCOCO and Flickr30K captioning datasets and achieve competitive performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.