A Comparison of CNN-Based and Hand-Crafted Keypoint Descriptors

Zhuang Dai,Li He,Xinghong Huang,Weinan Chen,Hong Zhang

doi:10.1109/icra.2019.8793701

Abstract

Keypoint matching is an important operation in computer vision and its applications such as visual simultaneous localization and mapping (SLAM) in robotics. This matching operation heavily depends on the descriptors of the keypoints, and it must be performed reliably when images undergo condition changes such as those in illumination and viewpoint. Previous research in keypoint description has pursued three classes of descriptors: hand-crafted, those from trained convolutional neural networks (CNN), and those from pre-trained CNNs. This paper provides a comparative study of the three classes of keypoint descriptors, in terms of their ability to handle conditional changes. The study is conducted on the latest benchmark datasets in computer vision with challenging conditional changes. Our study finds that (a) in general CNN-based descriptors outperform hand-crafted descriptors, (b) the trained CNN descriptors perform better than pre-trained CNN descriptors with respect to viewpoint changes, and (c) pre-trained CNN descriptors perform better than trained CNN descriptors with respect to illumination changes. These findings can serve as a basis for selecting appropriate keypoint descriptors for various applications.

Full Text