Perception of sound environments is influenced by the context we experience them in. A big portion of the contextual information comes from the visual domain, including our understanding of what a place is but also from its visual features. A laboratory study combining questionnaire responses and eye-tracking tools was designed to investigate if the soundscape outcomes and participants' behaviour inside the simulation can be explained by the perceptual outcomes defined by visual information. 360 degree videos and First Order Ambisonics audio recordings of 27 different urban open spaces taken from the International Soundscape Database were used as the stimuli delivered via a Virtual Reality Head-Mounted Display and a Higher Order Ambisonics speaker array, while a neutral grey environment without sounds being reproduced represented the baseline scenario. A questionnaire tool was deployed within the IVR simulation to collect participants' responses describing their perception of the reproduced environments corresponding to the circumplex model featured in the Method A of the ISO/TS 12913-2. The results revealed a good coverage of the two-dimensional perceptual circumplex space and significant differences between perceptual outcomes driven by sound and those driven by visual stimuli.