Abstract
Spherical harmonic (SH) interpolation is a commonly used method to spatially up-sample sparse head related transfer function (HRTF) datasets to denser HRTF datasets. However, depending on the number of sparse HRTF measurements and SH order, this process can introduce distortions into high frequency representations of the HRTFs. This paper investigates whether it is possible to restore some of the distorted high frequency HRTF components using machine learning algorithms. A combination of convolutional auto-encoder (CAE) and denoising auto-encoder (DAE) models is proposed to restore the high frequency distortion in SH-interpolated HRTFs. Results were evaluated using both perceptual spectral difference (PSD) and localisation prediction models, both of which demonstrated significant improvement after the restoration process.
Highlights
Virtual reality (VR) and augmented reality (AR) technologies are on the rise, through the advent of commercially available and affordable VR/AR headsets, with applications in gaming, education, therapy, social media and digital culture, amongst others
This paper investigates whether similar models can be used to restore the distorted high frequency data in spherical harmonic (SH)-interpolated head related transfer function (HRTF)
This model calculates the difference between two binaural signals or HRTFs, and presents a more accurate perceptual comparison of spectral differences as perceptual spectral difference (PSD)
Summary
Virtual reality (VR) and augmented reality (AR) technologies are on the rise, through the advent of commercially available and affordable VR/AR headsets, with applications in gaming, education, therapy, social media and digital culture, amongst others. The VR/AR technology must be able to deliver to the ears the same binaural cues as would be experienced in real life [1,2]. A virtual loudspeaker framework can be employed, wherein methods such as vector base amplitude panning (VBAP) [5] or Ambisonics [6] are used to render sources between virtual loudspeaker points formed from the HRTFs [7]. Both methods typically require a high number of HRTF measurements to ensure good spatial resolution in the rendered audio [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.