Energy green transition (EGT) is currently one of the main measures for countries around the world to address the contradiction between economic growth and increasingly deteriorating environmental and climate issues. Cities are the center of energy consumption. The key to EGT lies in urban energy green transition. Therefore, the focus of this study is on the driving mechanism of urban EGT. Firstly, the spatial-temporal characteristic of EGT in Chinese heterogeneous cities is analyzed by using methods such as gravity model. Secondly, the possible paths includes policy driven, innovation driven, market driven, and behavior driven for urban EGT are discussed through theoretical analysis. Finally, combined with panel data of 236 Chinese cities in 2007-2022, this study empirically analyzes the complex driving mechanism of urban EGT. Results show that: (1) The EGT in Chinese cities is continuing. From the perspective of urban heterogeneous, EGT in 1-tier and 2-tier cities is significantly faster than that in 3-, 4-, and 5-tier cities. The EGT speed in eastern cities is the fastest, while that in northeastern cities is the slowest. The difficulty of EGT in energy resource-based cities is actually the greatest. From the perspective of spatial features, the spatial center of EGT in Chinese cities generally shows a changing trend from northwest to southeast. (2) Policy driven, innovation driven, market driven, and behavior driven constitute the complex driving mechanism of urban EGT, and policy driven is the primary driving force for this round of EGT. (3) Positive effect of economic development level and education level improvement on EGT in Chinese cities is significant while resource endowment and population agglomeration level exhibit significant inhibitory effects. (4) There are significant differences in the core driving force for EGT in heterogeneous cities. Both policy driven effect and market driven effect have the highest impact in 1- and 2-tier cities. Innovation driven effect, market driven effect, and behavior driven effect are only significant in eastern and central cities. In energy resource-based cities, innovation driven effect of green innovation is not significant. This study can assist government departments better in formulating relevant policies to support energy transition, promote technological innovation, design market mechanisms, and guide energy consumption behavior.