Cross-domain shoe retrieval using a three-level deep feature representation

Huijing Zhan,Boxin Shi,Alex C Kot

doi:10.1109/iscas.2017.8050639

Abstract

In this paper, we address the problem of matching the shoes from the daily life photos to exactly the same shoes from online shops. The problem is extremely challenging because of the significant visual differences between street domain images (shoe images captured in the daily life scenario) and online domain photos (images from online shops taken in the controlled environment). This paper presents a semantic Shoe Attribute-Guided Convolutional Neural Network (SAG-CNN) to extract the deep features. Moreover, we develop a three-level feature representation based on SAG-CNN. The deep features extracted from the image, region and part levels effectively match the images across different domains. We collect a novel shoe dataset, which consists of 8021 street domain and 5821 online domain images. The experimental results on our dataset show that the top-20 retrieval accuracy of our approach improves over that using the pre-trained CNN features by about 40%.

Full Text