Abstract

Local feature descriptor learning aims to represent distinctive images or patches with the same local features, where their representation is invariant under different types of deformation. Recent studies have demonstrated that descriptor learning based on Convolutional Neural Network (CNN) is able to improve the matching performance significantly. However, they tend to ignore the importance of sample selection during the training process, leading to unstable quality of descriptors and learning efficiency. In this paper, a dual hard batch construction method is proposed to sample the hard matching and non-matching examples for training, improving the performance of the descriptor learning on different tasks. To construct the dual hard training batches, the matching examples with the minimum similarity are selected as the hard positive pairs. For each positive pair, the most similar non-matching example is then sampled from the generated hard positive pairs in the same batch as the corresponding negative. By sampling the hard positive pairs and the corresponding hard negatives, the hard batches are produced to force the CNN model to learn the descriptors with more efforts. In addition, based on the above dual hard batch construction, an ℓ22 triplet loss function is built for optimizing the training model. Specifically, we analyze the superiority of the ℓ22 loss function when dealing with hard examples, and also demonstrate it in the experiments. With the benefits of the proposed sampling strategy and the ℓ22 triplet loss function, our method achieves better performance compared to state-of-the-art on the reference benchmarks for different matching tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call