We propose an HRTF personalization method in which a Convolutional Neural Network (CNN) is employed to learn subjects' HRTFs from the scanned geometry of their heads. The trained model can then be employed to predict the global HRTF set (for all directions) from the subject's head scan data alone. In our trial, the HUTUBS HRTF database was used as the training set. A truncated spherical harmonic expansion of head scan data was used to preserve the boundary shape features that are important in the acoustic scattering process. Each HRTF for a given subject also was represented by a truncated spherical harmonic expansion. The SHT coefficients of the scanned head geometry and the HRTFs serve as training data for a CNN that subsequently can be used to predict HRTFs from geometric scan data. A leave-one-out validation with log-spectral distortion (LSD) metric was used for evaluation. The results show a decent LSD level at both spatial & temporal dimensions compared to the ground truth, and have lower LSD than the finite element acoustic simulation of HRTFs that the database provides. In continuing work, we are validating the prediction results in listener tests.
Read full abstract