Abstract

In this paper, we consider a package delivery drone that is desired to simultaneously perform aerial monitoring as a secondary mission. To integrate this secondary mission, we utilize a reward function representing the value of information gathered via aerial monitoring. We use time window temporal logic (TWTL) specifications to define the pickup and delivery tasks while utilizing reinforcement learning (RL) to maximize the expected sum of rewards. The high-level decision-making of the drone is modeled as a Markov decision process (MDP). In this regard, we extend the previous work where a model-free RL algorithm was used to solve this optimization problem. We propose a modified Dyna-Q algorithm to address the shortage of online samples. We provide extensive simulation results to compare the performance of the model-free and hybrid RL algorithms in this application and investigate the effect of the different system parameters on the overall performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.