Abstract

ABSTRACT Fashion image retrieval is one of the important services of e-commerce platforms, and it is also the basis of various fashion-related AI applications. Studies have shown that in a multi-modal environment (images + attribute labels), embedding items into specific attribute spaces can support more fine-grained similarity measures, which is especially suitable for fashion retrieval tasks. In this paper, we propose an attention-based attribute-guided similarity learning network (AttnFashion) for fashion image retrieval. The core of this network is an attribute-guided spatial attention module and an attribute-guided channel attention module, which correspond to the mapping between attributes and image regions, and the mapping between attributes and high-level image semantics, respectively. To make these two modules interact deeply, we design a parallel structure that allows them to share attribute embeddings and guide each other to extract specific features, which also helps to reduce the network parameters of the attention modules. An adaptive feature fusion strategy is proposed to synthesise the features extracted by the two modules. Extensive experiments show that the proposed AttnFashion performs better than current competitive networks in the field of fine-grained attribute-based fashion retrieval.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call