Abstract

Heterogeneous set-to-set matching, applied to fashion outfit recommendations, no longer depends on the similarity but on compatibility between items in sets. Existing state-of-the-art methods apply self- and cross-attention mechanisms to transform item vectors closer or farther apart and compute the matching score based on transformed item vectors. However, the transformation by attention mechanisms with fewer items in sets is performed in rather limited spaces, and the complex computation of scores in the head network would cause the instability of the training of entire networks. To overcome these problems, we propose to add a trainable set representative vector into each set and to embed discriminative information of items onto the vector through originally extended asymmetric attention mechanisms. This enables dynamic transformation in wider spaces and stable training without gradient vanishing problems to generate discriminative matching scores. Through experiments with heterogeneous set-to-set matching tasks, even-total, and fashion outfit matching tasks, we show the effectiveness of our proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call