Comparing deep reinforcement learning architectures for autonomous racing

Benjamin David Evans,Hendrik Willem Jordaan,Herman Arnold Engelbrecht

doi:10.1016/j.mlwa.2023.100496

Benjamin David Evans, Hendrik Willem Jordaan + Show 1 more

Open Access

https://doi.org/10.1016/j.mlwa.2023.100496

Copy DOI

Abstract

In classical autonomous racing, a perception, planning, and control pipeline is employed to navigate vehicles around a track as quickly as possible. In contrast, neural network controllers have been used to replace either part of or the entire pipeline. This paper compares three deep learning architectures for F1Tenth autonomous racing: full planning, which replaces the global and local planner, trajectory tracking, which replaces the local planner and end-to-end, which replaces the entire pipeline. The evaluation contrasts two reward signals, compares the DDPG, TD3 and SAC algorithms and investigates the generality of the learned policies to different test maps. Training the agents in simulation shows that the full planning agent has the most robust training and testing performance. The trajectory tracking agents achieve fast lap times on the training map but low completion rates on different test maps. Transferring the trained agents to a physical F1Tenth car reveals that the trajectory tracking and full planning agents transfer poorly, displaying rapid side-to-side swerving (slaloming). In contrast, the end-to-end agent, the worst performer in simulation, transfers the best to the physical vehicle and can complete the test track with a maximum speed of 5 m/s. These results show that planning methods outperform end-to-end approaches in simulation performance, but end-to-end approaches transfer better to physical robots.

Full Text