Abstract
Urban perception is a multifaceted process, spanning from vision to cognitive interpretation. Earlier studies on urban perception assessment have primarily concentrated on the analysis of graphic features of street view images such as color or semantic elements, neglecting the vital human sensory fixation features in cognitive processes. This study leveraged eye-tracking technology to accumulate human visual attention data during emotion scoring of street view images. Additionally, a deep learning network model was proposed to forecast human visual attention areas by assimilating environmental features such as image color, depth, and semantic elements, and subsequently amalgamating the aforementioned environmental features and visual attention to establish a perception prediction model, capable of predicting urban perception on a larger scale. Finally, the reliability and superiority of the urban perception prediction method were confirmed through ablation experiments and bias analysis. On the one hand, the method enhances the prediction accuracy of urban perception; on the other, it investigates the comprehensive influence of multiple features on human subjective emotions from the external environment perspective and individual perspective. This approach transcends the former limited reliance on single-image features, providing a novel and scientific approach for assessing urban perception.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have