Because the global climate change intensifies as well as the natural disasters frequently occur, extreme events have caused serious impacts on the energy system in urban areas, and at the same time, they have brought great challenges to the supply and scheduling of urban energy systems. Therefore, in order to better integrate and manage various energy resources in urban areas, a Deep Q-Leaning Network-Quasi Upper Confidence Bound model is innovatively constructed using deep reinforcement learning technology to learn the state and behavior mapping relationship of energy system. Use deep learning to fit complex nonlinear models to optimize the entire energy system. Compare and verify the experiment with the real energy system. The improved Deep reinforcement learning algorithm is compared with Q-learning model, PDWoLF PHC algorithm model, Quasi Upper Confidence Bound algorithm model and deep Q-Leaning Network algorithm model. The results show that the research algorithm has the smallest instantaneous error value and absolute value of frequency deviation for area control, and the average value of the research algorithm in the absolute value of the frequency deviation is reduced by 45%–73% compared to other algorithms; over time, the unit output power of the research algorithm is able to flexibly track the stochastic square wave loads. Therefore, the proposed system strategies can provide feasible solutions to meet the challenges of extreme events and promote the sustainable development and safe operation of urban energy systems.