Head-Related Transfer Function (HRTF) plays a critical role in how the auditory system perceives spatial information. The spectral cues embedded in HRTF are vital for accurately determining the elevation of sound sources. In existing approaches, deep neural networks (DNNs) have been utilized to predict the magnitude spectra of HRTF from images of the pinna, typically employing the HRTF log-magnitude as the output during training. However, HRTF encompasses the acoustic characteristics of both the head and torso, exhibiting direction-dependent patterns that pose challenges in reconstructing its spectral cues. To address this complexity, we propose an innovative method for HRTF individualization. Our model uses Pinna-Related Transfer Function (PRTF) as the output during training, which helps alleviate the impact of sound reflections from the head and torso in the head-related impulse response (HRIR). Our experimental findings, based on an HRTF dataset, illustrate that our proposed model excels in reconstructing the first and second spectral cues. Furthermore, it outperforms previous deep learning models in terms of log spectral distortion (LSD).
Read full abstract