Predicting human perception of the urban environment in a spatiotemporal urban setting using locally acquired street view images and audio clips

Deepank Verma,Arnab Jana,Krithi Ramamritham

doi:10.1016/j.buildenv.2020.107340

Abstract

This study investigates people's perception of visual and auditory landscapes in a mixed-use urban environment. A set of audio and visual data is collected at different intervals during the day in local streets with the help of an audio recorder and camera setup. The High and Low-level features from the collected audio and visual datasets are captured with the help of custom Deep Learning (DL) models and other standard algorithms. The collected data is used in the perception survey, which included human subjects (n = 73). The evaluation of the individual perception is done with the help of eight and six auditory and visual perceptual attributes, respectively. The results from the survey are then studied in relation to the features extracted from algorithms. Finally, a street of 10 km length is chosen within the study area where a spatiotemporal street-level visual and auditory data is collected. Statistical analysis and Machine Learning modeling are performed in the surveyed dataset to predict the human perception of audio and visual scenes in the chosen street. The results helped in understanding specific audio and visual features that are related to individual perceptions. Further, these relationships are utilized to create prediction models, which helped in creating spatiotemporal visual and auditory perception maps.

Full Text