Abstract

In sharp contrast to the traditional category/subcategory level image retrieval, product image search aims to find the images containing the exact same product. This is a challenging problem because in addition to being robust under different imaging conditions such as varying viewpoints and illumination changes, the features should also be able to distinguish the specific product among many similar products. Consequently, it is important to utilize a large dataset, containing many product classes, to learn a strongly discriminative representation. Building such a dataset requires laborious manual annotation. Toward learning fine-grained, robust, discriminative features for product image search, we present a novel paradigm that can construct the required dataset without any human annotation. Unlike other fine-grained recognition works that rely on high-quality annotated datasets and are very narrowly focused on a specific object category, our method handles multiple object classes and requires minimum human effort. First, an ImageNet pretrained model is used to generate product clusters. As the original features from ImageNet are not discriminative, the clusters generated by this unsupervised procedure contain much noise. We alleviate noise by explicitly modeling noise distribution and automatically detecting errors during learning. The proposed paradigm is general, requires minimum human efforts, and is applicable to any deep learning task where fine-grained discriminative features are desired. Extensive experiments on the ALISC dataset have demonstrated that our approach is sound and effective, surpassing the baseline GoogleNet model by 15.09%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call