Fashion Parsing with Video Context

Si Liu,Shuicheng Yan,Liang Lin,Ke Lu,Xiaodan Liang,Luoqi Liu

doi:10.1145/2647868.2654932

Abstract

In this paper, we explore how to utilize the video context to facilitate fashion parsing. Instead of annotating a large amount of fashion images, we present a general, affordable and scalable solution, which harnesses the rich contexts in easily available fashion videos to boost any existing fashion parser. First, we crawl a large unlabelled fashion video corpus with fashion frames. Then for each fashion video, the cross-frame contexts are utilized for human pose co-estimation, and then video co-parsing to obtain satisfactory fashion parsing results for all frames. More specifically, Sift Flow and super-pixel matching are used to build correspondences across frames, and these correspondences then con- textualize the pose estimations and fashion parsing in individual frames. Finally, these parsed video frames are used as the reference corpus for the non-parametric fashion parsing component of the whole solution. Extensive experiments on two benchmark fashion datasets as well as a newly collected challenging Fashion Icon (FI) dataset demonstrate the encouraging performance gain from our general pipeline for fashion parsing.

Full Text