Abstract

In recent years, deep learning-based recommender systems have received increasing attention, as deep neural networks can detect important product features in images and text descriptions and capture them in semantic vector representations of items. This is especially relevant for outfit recommendation, since a variety of fashion product features play a role in creating outfits. This work is a comparative study of fusion methods for outfit recommendation that combine relevant product features extracted from visual and textual data in semantic, multimodal item representations. We compare traditional fusion methods with attention-based fusion methods, which are designed to focus on the fine-grained product features of items. We evaluate the fusion methods on four benchmark datasets for outfit recommendation and provide insights into the importance of the multimodality and granularity of the fashion item representations. We find that the visual and textual item data not only share product features but also contain complementary product features for the outfit recommendation task, confirming the need to effectively combine them into multimodal item representations. Furthermore, we show that the average performance of attention-based fusion methods surpasses the average performance of traditional fusion methods on three out of the four benchmark datasets, demonstrating the ability of attention to learn relevant correlations among fine-grained fashion attributes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.