To guarantee the heterogeneous delay requirements of the diverse vehicular services, it is necessary to design a full cooperative policy for both Vehicle to Infrastructure (V2I) and Vehicle to Vehicle (V2V) links. This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links. Specifically, a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user, respectively. A multi-agent reinforcement learning framework is designed to solve these two problems, where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework. Thereafter, a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward. The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments.
Read full abstract