Abstract

In recent years, visual-inertial simultaneous localization and mapping (SLAM) have been widely researched and used. The monocular vision-based depth estimation and visual odometry (VO) plays a significant role in the SLAM system because of its low cost and high efficiency, which can be used to analyze the indoor environment for intelligent move robot application. Existing methods generally aim at the difference in pixel contrast between the previous and later frames to obtain the environment structure and camera pose. But the indifferent attention to the overall image makes the network spend extra computing power and hard to obtain the ideal results. For overcoming these disadvantages, an attention mechanism is proposed on basis of the original network and modified some convolution methods to make the network pay more attention to the key areas in the image and improve the accuracy of the estimation. The proposed approach is evaluated on the KITTI dataset, compared with the state-of-the-art methods achieves better results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call