Network slicing for vehicular communications: a multi-agent deep reinforcement learning approach

Zoubeir Mlika,Soumaya Cherkaoui

doi:10.1007/s12243-021-00872-w

Abstract

This paper studies the multi-agent resource allocation problem in vehicular networks using non-orthogonal multiple access (NOMA) and network slicing. Vehicles want to broadcast multiple packets with heterogeneous quality-of-service (QoS) requirements, such as safety-related packets (e.g., accident reports) that require very low latency communication, while raw sensor data sharing (e.g., high-definition map sharing) requires high-speed communication. To ensure heterogeneous service requirements for different packets, we propose a network slicing architecture. We focus on a non-cellular network scenario where vehicles communicate by the broadcast approach via the direct device-to-device interface (i.e., sidelink communication). In such a vehicular network, resource allocation among vehicles is very difficult, mainly due to (i) the rapid variation of wireless channels among highly mobile vehicles and (ii) the lack of a central coordination point. Thus, the possibility of acquiring instantaneous channel state information to perform centralized resource allocation is precluded. The resource allocation problem considered is therefore very complex. It includes not only the usual spectrum and power allocation, but also coverage selection (which target vehicles to broadcast to) and packet selection (which network slice to use). This problem must be solved jointly since selected packets can be overlaid using NOMA and therefore spectrum and power must be carefully allocated for better vehicle coverage. To do so, we first provide a mathematical programming formulation and a thorough NP-hardness analysis of the problem. Then, we model it as a multi-agent Markov decision process. Finally, to solve it efficiently, we use a deep reinforcement learning (DRL) approach and specifically propose a deep Q learning (DQL) algorithm. The proposed DQL algorithm is practical because it can be implemented in an online and distributed manner. It is based on a cooperative learning strategy in which all agents perceive a common reward and thus learn cooperatively and distributively to improve the resource allocation solution through offline training. We show that our approach is robust and efficient when faced with different variations of the network parameters and compared to centralized benchmarks.

Full Text