Abstract

Fashion attribute recognition is a not-new topic, but rather a core task in understanding fashion from the perspective of computer vision. This paper proposes a structured relation-aware network (sRA-Net), which exploits multiple hidden relations in fashion images to enrich and achieve accurate attribute representations to boost the performance of fashion attribute recognition. Specifically, it deconstructs the features of a clothing fashion item into three levels, including low-level attribute-related image region information, mid-level attribute dependency information, and high-level clothing look information. To learn these multi-relational embeddings, we present three relation-aware attention mechanisms. The attribute attention mechanism describes the relationship among different attribute vectors through self-attention and uses the attention map to update the attribute embedding. Then, the spatial attention mechanism associates the attribute with the image features and enhances the attribute embedding by leveraging the attribute-related image region. Finally, the channel attention mechanism selects attribute-related image feature channels to obtain a more fine-grained attribute embedding. Furthermore, we introduce structure-aware embedding to constrain attribute recognition in images from a global perspective by identifying the inner structure of the clothing. Without bells and whistles, sRA-Net outperforms all state-of-the-art attribute recognition methods on two mainstream fashion attribute datasets, namely the DeepFashion-C dataset and iFashion-Attribute dataset, with over 1%-3% improvement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call