Abstract

The spatial filtering effect brought on by sound propagation from the sound source to the outer ear is referred to as the head-related transfer function (HRTF). The personalization of HRTF is essential to enhance the personalized immersive audio experience in virtual and augmented reality. Our work aims to employ deep learning to predict the customized HRTF from anthropometric measurements. However, existing measured HRTF databases each employ a different geographic sampling, making it difficult to combine these databases into training data-hungry deep learning methods while each of them only contains dozens of subjects. Following our previous work, we use a neural field, a neural network that maps the spherical coordinates to the magnitude spectrum to represent each subject’s set of HRTFs. We constructed a generative model to learn the latent space across subjects using such a consistent representation of HRTF across datasets. In this work, by learning the mapping of the anthropometric measurements to the latent space and then reconstructing the HRTF, we further investigate the neural field representation to carry out HRTF personalization. Thanks to the grid-agnostic nature of our method, we are able to train on combined datasets and even validate the performance on grids unseen during training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call