This article presents the results of an empirical study that aimed to investigate the influence of various types of audio (spatial and non-spatial) on the user quality of experience (QoE) of and visual attention in 360° videos. The study compared the head pose, eye gaze, pupil dilations, heart rate, and subjective responses of 73 users who watched ten 360° videos with different sound configurations. The configurations evaluated were no sound; non-spatial (stereo) audio; and two spatial sound conditions (first- and third-order ambisonics). The videos covered various categories and presented both indoor and outdoor scenarios. The subjective responses were analyzed using an ANOVA (Analysis of Variance) to assess mean differences between sound conditions. Data visualization was also employed to enhance the interpretability of the results. The findings reveal diverse viewing patterns, physiological responses, and subjective experiences among users watching 360° videos with different sound conditions. Spatial audio, in particular third-order ambisonics, garnered heightened attention. This is evident in increased pupil dilation and heart rate. Furthermore, the presence of spatial audio led to more diverse head poses when sound sources were distributed across the scene. These findings have important implications for the development of effective techniques for optimizing processing, encoding, distributing, and rendering content in virtual reality (VR) and 360° videos with spatialized audio. These insights are also relevant in the creative realms of content design and enhancement. They provide valuable guidance on how spatial audio influences user attention, physiological responses, and overall subjective experiences. Understanding these dynamics can assist content creators and designers in crafting immersive experiences that leverage spatialized audio to captivate users, enhance engagement, and optimize the overall quality of VR and 360° video content. The dataset, scripts used for data collection, ffmpeg commands used for processing the videos, and the subjective questionnaire and its statistical analysis are publicly available.