Abstract

Reinforcement learning applications are spreading among different domains, including autonomous vehicle control. The diverse situations that can happen during, for instance, at a highway commute are infinite, and with labeled data, the perfect coverage of all use-cases sounds ambitious. However, with the complex tasks and complicated scenarios faced during an autonomous vehicle system design, the credit assignment problem arises. How to construct appropriate objectives for the artificial intelligence to learn and the preferences between the different goals also matter of the designer’s choice. This work attempts to tackle the problem by utilizing successor features and providing a possible decomposition of the reward functions, guiding the agent’s actions. This method makes the training easier for the agent and enables immediate, profound performance on new combined tasks. Furthermore, with the optimal composition, the desired behavior can be fine-tuned, and as an auxiliary gain, the decomposition empowers different driving styles and makes driving preferences rapidly changeable. We introduce the adaptation of FastRL algorithm to autonomous vehicle domain, meanwhile developing a stabilizing way of using Successor Features, namely DoubleFastRL. We compare our solution for a highway driving scenario with basic agents such as Q-learning having multi-objective training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call