Abstract

Fashion compatibility modeling, which is used to estimate the matching degree of a given set of fashion items, has received increasing attention in recent years. However, existing studies often fail to fully leverage multimodal information or ignore the semantic guidance of clothing categories in elevating the reliability of multimodal information. In this paper, we propose a fashion compatibility modeling approach with a category-aware multimodal attention network, termed as FCM-CMAN. In FCM-CMAN, we focus on enriching and aggregating multimodal representations of fashion items by means of the dynamic representations of categories and a contextual attention mechanism simultaneously. Specifically, considering that category correlations are always dynamic and varied for different fashion items, we design a categorical dynamic graph convolutional network to adaptively learn the semantic correlations between categories. When combined with the multi-layered visual outputs of a convolutional neural network and the surrounding contextual information, multiple content-aware category representations and context-aware attention weights are obtained to better characterize fashion items from different aspects. On this basis, two pieces of aware information are integrated by a multimodal factorized bilinear pooling strategy to generate visual-semantic embeddings, which are further improved by a multi-head self-attention mechanism to capture significant elements related to fashion compatibility. Extensive experiments conducted on the FashionVC and ExpFashion datasets demonstrate the superiority of FCM-CMAN over state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.