Vaccines are a highly effective intervention in mitigating the COVID-19 pandemic, but with limited resources, an optimal vaccine allocation plan is essential for reducing the number of infections. However, most previous studies on vaccine allocation strategies have neglected the fact that the real-world virus transmission environment is a network structure with dynamically changing flows between cities. To address this, we propose a Multi-City Network Vaccination Model that incorporates a stochastic daily multi-city virus transmission network to simulate a more realistic vaccination environment. We also present a novel reinforcement learning approach based on Proximal Policy Optimization (PPO) to allocate vaccines in our Multi-City Network Vaccination Model. Our PPO-based dynamic vaccine allocation approach reduces peak infections by 8% and is more robust than two other heuristic approaches. Our framework provides a valuable tool for regional and national authorities to make better public health decisions during a pandemic.