OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

Rohan Sarkar,Navaneeth Bodla,Gerard Medioni,Yen-Liang Lin,Anurag Beniwal,Alan Lu,Mariya I Vasileva

doi:10.1109/wacv56688.2023.00359

Abstract

Learning an effective outfit-level representation is critical for predicting the compatibility of items in an outfit, and retrieving complementary items for a partial outfit. We present a framework, OutfitTransformer, that uses the pro-posed task-specific tokens and leverages the self-attention mechanism to learn effective outfit-level representations en-coding the compatibility relations between all items in the entire outfit for addressing both compatibility prediction and complementary item retrieval. For compatibility pre-diction, we design an outfit token to capture a global out-fit representation and train the framework using a classification loss. For complementary item retrieval, we design a target item token that additionally takes the target item specification (in the form of a category or text description) into consideration. We train our framework using a pro-posed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification as inputs. The generated target item embedding is then used to retrieve compatible items that match the rest of the out-fit. Additionally, we adopt a pre-training approach and a curriculum learning strategy to improve retrieval performance. Experiments show that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks.

Full Text