Yellow River Basin (YRB) is a pivotal region for energy consumption and carbon emissions (CEs) in China, with cities emerging as the main sources of regional CEs. This highlights their critical role in achieving regional sustainable development and China’s carbon neutrality. Consequently, there is a pressing need for a detailed exploration of the urban spillover effects and an in-depth analysis of the complex determinants influencing CEs within the YRB. Remote sensing data provide optimal conditions for conducting extensive studies across large geographical areas and extended time periods. This study integrates DMSP/OLS and NPP/VIIRS nighttime light datasets for a longitudinal analysis of urban CEs in the YRB. Using a harmonized dataset from DMSP/OLS and NPP/VIIRS nighttime light from 2007 to 2021, this study quantifies CEs of 58 prefecture-level cities in the YRB. By combining ESDA, STIRPAT model and spatial econometric model, this investigation further clarifies empirically the spatial spillover effects and driving factors of urban CEs. The analysis delineates a phase-wise augmentation in urban CEs, converging towards a distinct spatial distribution characterized by “lower reach > middle reach > upper reach”. The spatial autocorrelation tests unravel a complex interplay between agglomeration and differentiation patterns within urban CEs, underscored by pronounced spatial lock-in phenomena. Significantly, this study demonstrates that urbanization, economic development, energy consumption structure, green coverage rate, industrial structure, population, technological progress, and FDI each exhibit varied direct and indirect effect on urban CEs. Furthermore, it elaborates on potential policy implications and future research directions, offering crucial insights for formulating CEs mitigation strategies to advance sustainable development.