Abstract

The unmanned aerial vehicle (UAV)-aided vehicular communication can be greatly facilitated by joint optimization of UAV trajectory design and vehicle assignment. While most existing works are based on the full observation of system state, we consider the partial observability with predicting the vehicles’ trajectories. The vehicle trajectory prediction and joint optimization problem are modeled as a Partially-Observable Markov Decision Process (POMDP). To deal with the non-Markovian of the POMDP, we construct a new deep recurrent Q-network (DRQN) framework based on deep Q-network (DQN) algorithm and Long Short Term Memory (LSTM) layer. Simulation results demonstrate that the proposed DRQN-based scheme is fast convergent and outperforms the baseline schemes in terms of the sum spectral efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call