Abstract

AbstractAssuring stability of the guidance law for quadrotor-type Urban Air Mobility (UAM) is important since it is assumed to operate in urban areas. Model free reinforcement learning was intensively applied for this purpose in recent studies. In reinforcement learning, the environment is an important part of training. Usually, a Proximal Policy Optimization (PPO) algorithm is used widely for reinforcement learning of quadrotors. However, PPO algorithms for quadrotors tend to fail to guarantee the stability of the guidance law in the environment as the search space increases. In this work, we show the improvements of stability in a multi-agent quadrotor-type UAM environment by applying the Soft Actor-Critic (SAC) reinforcement learning algorithm. The simulations were performed in Unity. Our results achieved three times better reward in the Urban Air Mobility environment than when trained with the PPO algorithm and our approach also shows faster training time than the PPO algorithm.KeywordsDeep reinforcement learningMulti-agent systemTraffic networkUrban Air Mobility (UAM)Proximal Policy Optimization (PPO)Soft Actor-Critic (SAC)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call