Abstract
This paper presents the visual navigation method for determining the position and orientation of a ground robot using a diffusion map of robot images (obtained from a camera in an upper position—e.g., tower, drone) and for investigating robot stability with respect to desirable paths and control with time delay. The time delay appears because of image processing for visual navigation. We consider a diffusion map as a possible alternative to the currently popular deep learning, comparing the possibilities of these two methods for visual navigation of ground robots. The diffusion map projects an image (described by a point in multidimensional space) to a low-dimensional manifold preserving the mutual relationships between the data. We find the ground robot’s position and orientation as a function of coordinates of the robot image on the low-dimensional manifold obtained from the diffusion map. We compare these coordinates with coordinates obtained from deep learning. The algorithm has higher accuracy and is not sensitive to changes in lighting, the appearance of external moving objects, and other phenomena. However, the diffusion map needs a larger calculation time than deep learning. We consider possible future steps for reducing this calculation time.
Highlights
Deep learning [1,2,3,4,5,6,7] is a very popular and powerful instrument for arriving at the solution of complex problems of classification and function regression.The main advantage of this method is that we need not develop some complex features describing the group of investigated objects
We find the ground robot coordinates as a function of coordinates of the robot image on the low-dimensional manifold obtained from the diffusion map
We offer an important practical example, demonstrating that the diffusion map can compete with deep learning
Summary
Deep learning (based on artificial neural networks) [1,2,3,4,5,6,7] is a very popular and powerful instrument for arriving at the solution of complex problems of classification and function regression. The main advantage of this method is that we need not develop some complex features describing the group of investigated objects. “PoseNet is [1] is based on the GoogLeNet architecture It processes RGB-images and is modified so that all three softmax and fully connected layers are removed from the original model and replaced by regressors in the training phase. In the testing phase the other two regressors of the lower layers are removed and the prediction is done solely based on the regressor on the top of the whole network
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.