Abstract
Visual navigation (vNavigation) is a key and fundamental technology for artificial agents' interaction with the environment to achieve advanced behaviors. Visual navigation for artificial agents with deep reinforcement learning (DRL) is a new research hotspot in artificial intelligence and robotics that incorporates the decision making of DRL into visual navigation. Visual navigation via DRL, an end-to-end method, directly receives the high-dimensional images and generates an optimal navigation policy. In this paper, we first present an overview on reinforcement learning (RL), deep learning (DL) and deep reinforcement learning (DRL). Then, we systematically describe five main categories of visual DRL navigation: direct DRL vNavigation, hierarchical DRL vNavigation, multi-task DRL vNavigation, memory-inference DRL vNavigation and vision-language DRL vNavigation. These visual DRL navigation algorithms are reviewed in detail. Finally, we discuss the challenges and some possible opportunities to visual DRL navigation for artificial agents.
Highlights
INTRODUCTIONArtificial agents refer to software or hardware entities that can perform actions in an environment independently, and include virtual robots (such as characters in games and entities in virtual environments) and real robots (such as service robots, industrial robots, and unmanned vehicles)
Artificial agents refer to software or hardware entities that can perform actions in an environment independently, and include virtual robots and real robots
Laser simultaneous localization and mapping (SLAM) has achieved some success in recent years, the high price of laser sensors hinders the practical application of laser SLAM, and the efficiency of laser SLAM is susceptible to the poor weather conditions, The associate editor coordinating the review of this manuscript and approving it for publication was Junchi Yan
Summary
Artificial agents refer to software or hardware entities that can perform actions in an environment independently, and include virtual robots (such as characters in games and entities in virtual environments) and real robots (such as service robots, industrial robots, and unmanned vehicles). Both PTAM and ORB-SLAM are based on feature extraction, but the feature method cannot process texture images well To address this issue, Engel et al [8] proposed LSD-SLAM which is a direct (feature-less) visual SLAM algorithm, and LSD-SLAM enables the construction of large-scale and consistent maps of the environment. One prominent issue is their susceptibility to sensor noises accumulation that propagates down the pipeline from the mapping, localization to path planning, leading these algorithms with less robust performance They require extensive case-specific scenario-driven manual-engineering, making traditional navigation difficult to integrate with other downstream artificial intelligent tasks that have achieved superior performance with the learning methods, such as visual recognition, question answering, and other advanced intelligent tasks [10].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.