Learning local feature representation from matching, clustering and spatial transform

Jianhan Mei,Xudong Jiang,Jianfei Cai

doi:10.1016/j.jvcir.2019.102601

Abstract

This paper focuses on learning the local image region representation via deep neural networks. Existing works mainly learn from matched corresponding image patches, with which the learned feature is too sensitive to the individual local patch matching result and cannot handle aggregation based tasks such as image level retrieval. Thus, we propose to use both the matched corresponding image patches and the clustering result as labels for the network training. To resolve the inconsistency between the matched correspondences and clustering results, we propose a semi-supervised iterative training scheme together with a dual margins loss. Moreover, a jointly learned spatial transform prediction network is utilized to obtain better spatial transform invariance of the learned local features. Using SIFT as the label initializer, experimental results show the comparable or even better performance than the hand-crafted feature, which sheds lights on learning local feature representation in an unsupervised or weakly supervised manner.

Full Text