Abstract
Individual head-related transfer functions (HRTFs) are critical for binaural spatial audio rendering. In contrast to anthropometric parameters and pinnae images, 3D meshes allow for a more direct and comprehensive representation of the anthropometric structure, which provides highly effective inputs for modeling individualized HRTFs. This paper presents a neural network-based method for predicting individualized HRTFs in full space based on 3D meshes. Unlike many previous methods that estimate HRTF spectra at sampling grids or frequencies separately, the proposed model predicts the HRTF spectra of each vertical plane by considering the spectral correlation and continuity across adjacent sampling grids and frequencies. Evaluation results indicate that the proposed method enhances the prominence of peaks and notches in the obtained HRTF spectra and improves the speed and accuracy of HRTF individualization. The log spectral distortion of the proposed method is lower than that of state-of-the-art methods using anthropometric parameters and pinnae images. Further evaluation confirms that the proposed method requires significantly fewer points in 3D meshes when compared to numerical simulation methods. The evaluation based on localization models demonstrates that the HRTFs predicted by the proposed method are perceptually similar to the measured HRTFs.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have