Abstract

With the rapid development of online shopping, interpretable personalized fashion recommendation using image has attracted increasing attention in recent years. The current work has been able to capture the user's preferences for visible features and provide visual explanations. However, they ignored the invisible features, such as the material and quality of the clothes, and failed to offer textual explanations. To this end, we propose a Visual and Textual Jointly Enhanced Interpretable (VTJEI) model for fashion recommendations based on the product image and historical review. The VTJEI can provide more accurate recommendations and visual and textual explanations through the joint enhancement of textual information and visual information. Specifically, we design a bidirectional two-layer adaptive attention review model to capture the user's visible and invisible preferences to the target product and provide textual explanations by highlighting some words. Moreover, we propose a review-driven visual attention model to get a more personalized image representation driven by the user's preference obtained from the historical review. In this way, we not only realize the joint enhancement of visual information and textual information but also provide a visual explanation by highlighting some regions. Finally, we performed extensive experiments on real datasets to confirm the superiority of our model on Top-N recommendations. We also built a labeled dataset for evaluating our provided visible and invisible explanations quantitatively. The result shows that we can not only provide more accurate recommendations but also can provide both visual and textual explanations.

Highlights

  • Nowadays, when buying fashion products online, the user’s decisions are primarily affected by the appearance of products [1]

  • We develop a novel framework, Visual and Textual Jointly Enhanced Interpretable (VTJEI) model for fashion recommendation

  • RELATED WORK we introduce some of the fashion recommendations and interpretable recommendations that have been

Read more

Summary

Introduction

Nowadays, when buying fashion products online, the user’s decisions are primarily affected by the appearance of products [1]. The invisible features that the user cannot observe from the image, such as the material and quality of the clothes, affect the user’s decisions. Most of the existing methods use pre-trained convolution models to convert the entire fashion image into a fixed-length global image embedding [2]–[5], which ignore visual preference for the specified user and fail to generate reasonable visual explanations. To solve this problem, Chen et al [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call