Abstract

In view of the poor performance of traditional feature point detection methods in low-texture situations, we design a new self-supervised feature extraction network that can be applied to the visual odometer (VO) front-end feature extraction module based on the deep learning method. First, the network uses the feature pyramid structure to perform multi-scale feature fusion to obtain a feature map containing multi-scale information. Then, the feature map is passed through the position attention module and the channel attention module to obtain the feature dependency relationship of the spatial dimension and the channel dimension, respectively, and the weighted spatial feature map and the channel feature map are added element by element to enhance the feature representation. Finally, the weighted feature maps are trained for detectors and descriptors respectively. In addition, in order to improve the prediction accuracy of feature point locations and speed up the network convergence, we add a confidence loss term and a tolerance loss term to the loss functions of the detector and descriptor, respectively. The experiments show that our network achieves satisfactory performance under the Hpatches dataset and KITTI dataset, indicating the reliability of the network.

Highlights

  • In computer vision-based applications such as simultaneous localization and mapping (SLAM), structure-from-motion (SFM), and image retrieval, the processing of image feature points determines the correspondence between different images

  • We add softargmax to improve the prediction accuracy of feature points and add a confidence loss term to ensure the reliability of feature points

  • Compared with traditional algorithms and related deep learning-based algorithms, our network has a significant improvement in accuracy due to the addition of feature pyramid networks (FPN) and attention modules to optimize feature maps

Read more

Summary

Introduction

The detection of feature points and the establishment of descriptors are important steps in image matching. In computer vision-based applications such as simultaneous localization and mapping (SLAM), structure-from-motion (SFM), and image retrieval, the processing of image feature points determines the correspondence between different images. Accurate extraction of feature points can improve the matching accuracy of images. With the wide applications of computer vision and the more complex environment faced by image processing, it is important to find a stable feature point detection method. The processing methods for image feature points can be divided into traditional methods and deep learning-based methods. Traditional feature extraction methods are difficult to achieve satisfactory performance in challenging situations. The scale invariant feature transform (SIFT) algorithm [1] was scale invariant but not real-time

Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.