Abstract

We address the problem of repeated coverage by a team of robots of the boundaries of a target area and the structures inside it. Events may occur on any parts of the boundaries and may have different importance weights. In addition, the boundaries of the area and the structures are heterogeneous, so that events may appear with varying probabilities on different parts of the boundary, and this probability may change over time. The goal is to maximize the reward by detecting the maximum number of events, weighted by their importance, in minimum time. The reward a robot receives for detecting an event depends on how early the event is detected. To this end, each robot autonomously and continuously learns the pattern of event occurrence on the boundaries over time, capturing the uncertainties in the target area. Based on the policy being learned to maximize the reward, each robot then plans in a decentralized manner to select the best path at that time in the target area to visit the most promising parts of the boundary. The performance of the learning algorithm is compared with a heuristic algorithm for the Travelling Salesman Problem, on the basis of the total reward collected by the team during a finite repeated boundary coverage mission.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.