Abstract
In this paper, we address the problem of matching the shoes from the daily life photos to exactly the same shoes from online shops. The problem is extremely challenging because of the significant visual differences between street domain images (shoe images captured in the daily life scenario) and online domain photos (images from online shops taken in the controlled environment). This paper presents a semantic Shoe Attribute-Guided Convolutional Neural Network (SAG-CNN) to extract the deep features. Moreover, we develop a three-level feature representation based on SAG-CNN. The deep features extracted from the image, region and part levels effectively match the images across different domains. We collect a novel shoe dataset, which consists of 8021 street domain and 5821 online domain images. The experimental results on our dataset show that the top-20 retrieval accuracy of our approach improves over that using the pre-trained CNN features by about 40%.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.