Multi-Objective, Multi-Armed Bandits: Algorithms for Repeated Games and Application to Route Choice

Candy A Huanca-Anquise,Ana Lúcia Cetertich Bazzan,Anderson R Tavares

doi:10.22456/2175-2745.122929

Abstract

Multi-objective decision-making in multi-agent scenarios poses multiple challenges. Dealing with multiple objectives and non-stationarity caused by simultaneous learning are only two of them, which have been addressed separately. In this work, reinforcement learning algorithms that tackle both issues together are proposed and applied to a route choice problem, where drivers must select an action in a single-state formulation, while aiming to minimize both their travel time and toll. Hence, we deal with repeated games, now with a multi-objective approach. Advantages, limitations and differences of these algorithms are discussed. Our results show that the proposed algorithms for action selection using reinforcement learning deal with non-stationarity and multiple objectives, while providing alternative solutions to those of centralized methods.

Full Text