Abstract

Keypoint matching is an important operation in computer vision and its applications such as visual simultaneous localization and mapping (SLAM) in robotics. This matching operation heavily depends on the descriptors of the keypoints, and it must be performed reliably when images undergo condition changes such as those in illumination and viewpoint. Previous research in keypoint description has pursued three classes of descriptors: hand-crafted, those from trained convolutional neural networks (CNN), and those from pre-trained CNNs. This paper provides a comparative study of the three classes of keypoint descriptors, in terms of their ability to handle conditional changes. The study is conducted on the latest benchmark datasets in computer vision with challenging conditional changes. Our study finds that (a) in general CNN-based descriptors outperform hand-crafted descriptors, (b) the trained CNN descriptors perform better than pre-trained CNN descriptors with respect to viewpoint changes, and (c) pre-trained CNN descriptors perform better than trained CNN descriptors with respect to illumination changes. These findings can serve as a basis for selecting appropriate keypoint descriptors for various applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call