Abstract

With the growing up of Internet of Things technology, the application of Internet of Things has been popularized in the field of intelligent vehicles. Therefore, more artificial intelligence algorithms, especially DRL methods, are more widely used in autonomous driving. A large number of deep reinforcement learning (RL) technologies are continuously applied to the behavior planning module of single-vehicle autonomous driving in early. However, autonomous driving is an environment where multi-intelligent vehicles coexist, interact with each other, and dynamically change. In this environment, multiagent RL technology is one of the most promising technologies for solving the coordination behavior planning problem of multivehicles. However, the research related to this topic is rare. This paper introduces a dynamic coordination graph (CG) convolution technology for the cooperative learning of multi-intelligent vehicles. This method dynamically constructs a CG model among multiple vehicles, effectively reducing the impact of unrelated intelligent vehicles and simplifying the learning process. The relationship between intelligent vehicles is refined using the attention mechanism, and the graph convolution RL technology is used to simulate the message-passing aggregation algorithm to maximize the local utility and obtain the maximum joint utility to guide coordination learning. Driving samples are used as training data, and the model guided by reward shaping is combined with the model of the free graph convolution RL method, which enables our proposed method to achieve high gradualness and improve its learning efficiency. In addition, as the graph convolutional RL algorithm shares parameters between agents, it can easily build scales that are suitable for large-scale multiagent systems, such as traffic environments. Finally, the proposed algorithm is tested and verified for the multivehicle cooperative lane-changing problem in the simulation environment of autonomous driving. Experimental results show that our proposed method has better value function representation in that it can learn better coordination driving policies than traditional dynamic coordination algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call