Based on the interrelationship between the built environment and spatial–temporal distribution of population density, this paper proposes a method to predict the spatial–temporal distribution of urban population density using the depth residual network model (ResNet) of neural network. This study used the time-sharing data of mobile phone users provided by the China Mobile Communications Corporation to predict the time–space sequence of the steady-state distribution of population density. Firstly, 40 prediction databases were constructed according to the characteristics of built environment and the spatial–temporal distribution of population density. Thereafter, the depth residual model ResNet was used as the basic framework to construct the behaviour–environment agent model (BEM) for model training and prediction. Finally, the average percentage error index was used to evaluate the prediction results. The results revealed that the accuracy rate of prediction results reached 76.92% in the central urban area of the verification case. The proposed method can be applied to prevent urban public safety incidents and alleviate pandemics. Moreover, this method can be practically applied to enable the construction of a “smart city” for improving the efficient allocation of urban resources and traffic mobility.