Intelligent transportation equipment represented by Artificial Intelligence Robot of Transportation (ART) has been gradually applied to automated container terminals (ACT). ARTs have higher intelligence and distributed autonomous decision-making capability compared to AGVs. The production and operation of ACT require strict loading plans based on container types and destination ports, which puts strict requirements on the ART loading task sequence. ART travel time from the yard to the quay is highly uncertain due to special scenarios such as traffic flow convergence and turning in narrow areas, which will lead to long waits for ART due to out-of-sequence arrivals at the quay. Therefore, it is necessary to regulate the speed of ART to adjust the timing of ART arriving at the quay. However, existing scheduling systems tend to ignore the impact of ART speed and loading sequence constraints on terminal operations. To optimize the multi-ART control problem of ACT, the waiting time of ART at the quay is taken as the optimization goal, and the speed regulation of ART during each task is defined as a Markov decision process. Then, by incorporating reinforcement learning and a multi-agent approach, an adaptive decision-making method is developed to generate the optimal speed regulation strategy. This method interacts with the multi-agent simulation environment to obtain real-time status information and learns strategies online to improve the intelligence of individual agents and the collaborative capabilities of the agent group. Finally, a multi-agent simulation model embedded with a reinforcement learning agent is established on AnyLogic software. Simulation results show that this method can effectively reduce the waiting time of ART, improve the coherence and efficiency of loading operations, and exhibit good performance in online scenarios that consider emergency tasks. The results indicate that the speed of ART in ACTs with parallel layout of the yard can impact the operational efficiency of the system. Furthermore, by appropriately defining the reward function and training method, collaborative behaviour among equipment can be enhanced, ultimately optimizing the overall performance of the system.
Read full abstract