Abstract

There is increased interest in using street photos to understand fashion trends. Though street photos usually contain rich clothing information, there are several technical challenges to their analysis. First, street photos collected from social media sites often contain user-provided noisy labels, and training models using these labels may deteriorate prediction performance. Second, most existing methods predict multiple clothing attributes individually and do not consider the potential to share knowledge between related tasks. In addition to these technical challenges, most fashion image datasets created by previous studies focus on American and European fashion styles. To address these technical challenges and understand fashion trends in Asia, we created RichWear, a new street fashion dataset containing 322,198 images with various text labels for fashion analysis. This dataset, collected from an Asian social network site, focuses on street styles in Japan and other Asian areas. RichWear provides a subset of expert-verified labels in addition to user-provided noisy labels for model training and evaluation. We propose the Fashion Attributes Recognition Network (FARNet) based on the multi-task learning framework to improve fashion recognition. Instead of predicting each clothing attribute individually, FARNet predicts three types of attributes simultaneously, and, once trained, this network leverages the noisy labels and generates corrected labels based on the input images. Experimental results show that this approach significantly outperforms existing methods. Applying the trained model to the RichWear dataset, we report Asian fashion trends and street styles based on predicted labels and image clusters from latent feature vectors.

Highlights

  • As interest has increased in the possible relationships between artificial intelligence (AI) and fashion, more and more approaches are being proposed for fashion recognition and understanding

  • We have observed that most datasets used for street fashion research [4], [5], [12]–[15] are collected from social media sites based in the United States and Europe, and these images are mainly related to American and European street styles

  • The benefit of multi-task learning (MTL) compared with single-task learning (STL) is that it allows for exploration of latent connections and facilitates knowledge sharing between tasks, leading to an overall improvement of model performance [21]

Read more

Summary

Introduction

As interest has increased in the possible relationships between artificial intelligence (AI) and fashion, more and more approaches are being proposed for fashion recognition and understanding. These street photos on social media sites provide much-needed data for AI research. The large-scale street images have led researchers to analyze street fashion using techniques such as deep learning [4]–[10] and natural language processing [11]. We have observed that most datasets used for street fashion research [4], [5], [12]–[15] are collected from social media sites based in the United States and Europe, and these images are mainly related to American and European street styles. Fashion data collected from the internet usually contain user-provided labels that are inconsistent with the images and are referred to as noisy labels.

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call