AbstractSite‐structure cluster interaction (SSCI) has a significant effect on the seismic response of densely distributed buildings, and rationally and effectively quantifying its effect when modelling the urban seismic damage can provide more optimal decisions to mitigate earthquake disasters. This paper focuses on the input motion of the structures, by extracting the main influential parameters of SSCI and adopting the loosely typed wavelet packet neural network to rapidly simulate the spatially varying ground motion in the urban environment. In the proposed framework, the wavelet packet energy ratio is presented to describe the variation of ground motion characteristics and used as the sole output to carry out the multi‐resolution spectral modulation, and the training samples were accumulated by a validated finite element simulation method. The developed surrogate model considers the effects of a series of factors, including the earthquake intensity, site condition, configuration of structure cluster, structural dynamic characteristics and spacing, and is superior to the one using conventional artificial neural network. It is verified by a virtual test that the waveshape and spectral features of the predicted ground motion agree well with the target result with an error of peak acceleration being only 1.23%. The suggested approach has the advantages of better modulation precision and lower sample size requirement. Moreover, it is almost zero cost to use the developed surrogate model to correct the ground motion of urban buildings and to consider the influence of SSCI, and the structural seismic response can be more factually displayed in the time and space domains. These specialties make it a promising technique in the rapid assessment of urban seismic damage.