Abstract

Autonomous robot visual navigation is a fundamental locomotion task based on extracting relevant features from images taken from the surrounded environment to control an independent displacement. In the navigation, the use of a known visual map helps obtain an accurate localization, but in the absence of this map, a guided or free exploration pathway must be executed to obtain the images sequence representing the visual map. This paper presents an appearance-based localization method based on a visual map and an end-to-end Convolutional Neural Network (CNN). The CNN is initialized via transfer learning (trained using the ImageNet dataset), evaluating four state-of-the-art CNN architectures: VGG16, ResNet50, InceptionV3, and Xception. A typical pipeline for transfer learning includes changing the last layer to adapt the number of neurons according to the number of custom classes. In this work, the dense layers after the convolutional and pooling layers were substituted by a Global Average Pooling (GAP) layer, which is parameter-free. Additionally, an L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -norm constraint was added to the GAP layer feature descriptors, restricting the features from lying on a fixed radius hypersphere. These different pre-trained configurations were analyzed and compared using two visual maps found in the CIMAT-NAO datasets consisting of 187 and 94 images, respectively. For evaluating the localization tasks, a set of 278 and 94 images were available for each visual map, respectively. The numerical results proved that by integrating the L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -norm constraint in the training pipeline, the appearance-based localization performance is boosted. Specifically, the pre-trained VGG16 and Xception networks achieved the best localization results, reaching a top-3 accuracy of 90.70% and 93.62% for each dataset, respectively, overcoming the referenced approaches based on hand-crafted feature extractors.

Highlights

  • Autonomous navigation is a highly desired capability in mobile robotics because it allows moving from an initial position towards the desired target without external intervention in changing environments

  • This paper evaluates four state-of-the-art network architectures pre-trained on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competition as a feature extractor

  • In this paper, four different state-of-the-art Convolutional Neural Network (CNN) architectures were evaluated for the humanoid robot appearance-based localization problem

Read more

Summary

INTRODUCTION

Autonomous navigation is a highly desired capability in mobile robotics because it allows moving from an initial position towards the desired target without external intervention in changing environments. 4) An autonomous navigation stage which allows the robot to move to a particular location associated with the desired key-image by freely following the predefined visual path. Such a methodology represents the whole environment as a collection of indexed images in a direct graph. During the robot displacements, the biped locomotion produces blurred images, and the sway motion induces image rotation around the camera optical axis Such a methodology includes a visual map built by selecting a subset of images (key-images) from a learning sequence.

RELATED WORK
PROPOSED VISUAL LOCALIZATION METHOD
FEATURE DESCRIPTOR EXTRACTION Given a visual map defined by
L2-SOFTMAX LOSS
LOCALIZATION PROCESS
RESULTS AND DISCUSSION
PERFORMANCE COMPARISON
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.