A data-driven exploration of elevation cues in HRTFs: an explainable AI perspective across multiple datasets

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract Precise elevation perception in binaural audio remains a challenge, despite extensive research on head-related transfer functions (HRTFs) and spectral cues. While prior studies have advanced our understanding of sound localization cues, the interplay between spectral features and elevation perception is still not fully understood. This paper presents a comprehensive analysis of over 600 subjects from 11 diverse public HRTF datasets, employing a convolutional neural network (CNN) model combined with explainable artificial intelligence (XAI) techniques to investigate elevation cues. In addition to testing various HRTF pre-processing methods, we focus on both within-dataset and inter-dataset generalization and explainability, assessing the model’s robustness across different HRTF variations stemming from subjects and measurement setups. By leveraging class activation mapping (CAM) saliency maps, we identify key frequency bands that may contribute to elevation perception, providing deeper insights into the spectral features that drive elevation-specific classification. This study offers new perspectives on HRTF modeling and elevation perception by analyzing diverse datasets and pre-processing techniques, expanding our understanding of these cues across a wide range of conditions.

Similar Papers
  • Dissertation
  • Cite Count Icon 1
  • 10.25911/5d78d9d868376
Experimental guided spherical harmonics based head-related transfer function modeling
  • Apr 5, 2013
  • Mengqiu Zhang

In this thesis we investigate the experimental guided spherical harmonics based Head-Related Transfer Function (HRTF) modeling where HRTFs are parameterized as frequency and source location. We focus on efficiently representing the HRTF variations in sufficient detail by mathematical modeling and the experimental measurements. The goal of this work is towards an optimal functional HRTF modeling taking into account the demands of decreasing the computational cost and alleviating the HRTF interpolation and/or extrapolation in the headphone based binaural systems. To represent HRTF by models, we firstly consider the high variability of HRTFs among individuals caused by the differentiation of the scattering effects of the individual bodies on the sound waves. We conduct a series of statistical analyses on an experimental HRTF database of human subjects to reveal the correlation between the physical features of human beings, especially pinna, head, and torso, and the corresponding HRTFs. The strategy enables us to identify a minimal set of physical features which strongly influence the HRTFs in a direct physical way. We next consider the continuity of the HRTF representation in both spatial and frequency domain. We define a functional HRTF model class in which the HRTF spatial representation has been justified to be well approximated by a finite number of spherical harmonics while HRTF frequency representation remains the focus of this thesis. In order to seek an efficient representation for HRTF frequency portion, we derive a metric that is able to numerically evaluate the efficiency of different complete orthonormal bases. We show that the complex exponentials form the most efficient basis. Given the identified basis, we then provide a solution to determine the dimensionality of the representation. To represent HRTF by measurements, we firstly consider the required angular resolution and the most suitable sampling scheme taking into account the two dimensional angular direction and the wide audio frequency range. We review the spherical harmonic analysis of the HRTF from which the least required number of spatial samples for HRTF measurement is derived. Considering how the HRTF data should be sampled on the sphere, we propose a list of requirements for the determination of the HRTF measurement grid. In addition to explaining how to measure the HRTF over sphere according to the identified scheme, we propose a fast spherical harmonic transform algorithm. We next consider the feasible experimental setup for a non-anechoic situation, that is, the measurements can be made when there is some reverberation. We emphasize on the design of the test signal and the post-processing to extract HRTFs.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icdipc.2019.8723710
Head Related Transfer Function Interpolation Based on Finite Impulse Response Models
  • May 1, 2019
  • Bahaa Al-Sheikh + 2 more

Head Related Transfer Function (HRTF) describes how the ear and the head in general respond to sounds coming from certain point in space to the ear. It is defined as the ratio of the spectrum at the ear-drum to the spectrum at the sound source. HRTF modeling and reconstruction have been used widely in auditory nervous system research and virtual reality technology, especially in the wearable virtual auditory display (VAD) devices. VADs are currently used in physiological and psychoacoustic research, medical applications, military simulations, industry, in addition to entertainment. HRTFs are usually measured at finite directions in azimuth and elevation because of many limitations. To create complete VAD, HRTFs are modeled or synthesized at finer spatial resolution or directions where they are not measured. Systematic movement of zeros in Finite Impulse Response (FIR) models of Directional Transfer Functions (DTFs), defined as the directional components of the HRTFs, is tracked. Interpolation of zeros movement to create HRTFs at new directions is presented in this paper. HRTFs of subjects from human database are used for this purpose and the reconstructed HRTFs are evaluated by objective and subjective tests for validation. Interpolation at some directions is successfully applied and validated on those models’ orders.

  • Dissertation
  • Cite Count Icon 5
  • 10.7907/sy24-x458.
Neural computations leading to space-specific auditory responses in the barn owl.
  • Jan 1, 2002
  • Benjamin Jacob Arthur

Sound localization is the ability to pinpoint the direction a sound is coming from based on auditory cues alone. Neurons in the brain which mediate this behavior are active only when sound comes from a particular direction. This thesis uses physiological and anatomical methods to investigate the computations which lead to such space-specific neural responses in the barn owl. Chapter 3 studies a behavioral and neural phenomenon called phase ambiguity, which arises from the way in which the auditory nerve and cochlear nuclei encode acoustic information. Phase ambiguity causes errors in sound localization to be made for tonal stimuli, and is resolved through the convergence of information across different frequencies in broadband noise stimuli. Data presented here show that a continuous band of noise is not necessary; a set of tones spaced at the critical bandwidth resolves phase ambiguity just as well as a noise stimulus. This is due to a sub-linear interaction for tones of nearby frequencies. Chapter 4 addresses the head-related transfer function (HRTF) model of sound localization. While traditional barn owl models use linear equations to relate interaural time differences (ITD) to azimuth and interaural intensity differences (IID) to elevation, the HRTF model purports that IID is dependent on frequency to such an extent that pattern recognition is used to match the spectral shape of IID in the stimulus to that characteristic of particular directions in space. Data presented here confirm predictions made by the HRTF model that IID tuning changes with frequency in space-mapped neurons, and that two-tone stimuli whose IIDs match these changes elicit better responses than those which do not. Chapter 5 investigates the computation of space-specificity in the forebrain. Previous anatomical studies have suggested that the space-specificity seen there is not merely inherited from the space map in the midbrain, but rather arises, at least in part, independently. The data presented here reconfirm that the forebrain pathway branches off from the midbrain pathway before the convergence across frequencies leads to space-specific neurons. All previous computations, however, including the formation of ITD-IID combination sensitivity, seem to be shared. Collectively, these three studies expand our knowledge of the neurophysiology of sound localization in the barn owl by detailing specific mechanisms underlying the computation of space-specific neural responses.

  • Book Chapter
  • 10.1007/978-1-4020-8735-6_56
Modeling of Head-Related Transfer Functions Through Parallel Adaptive Filters
  • Jan 1, 2008
  • Kenneth John Faller + 2 more

Currently, sound spatialization techniques that utilize “individual” Head-Related Transfer Functions (HRTFs) require the intended listener to undergo lengthy measurements with specialized equipment. Alternatively, the use of generic HRTFs may contribute to additional localization errors. A third possibility that we are pursuing is the customization of HRTFs, performed on the basis of geometrical measurements of the intended listener to determine the appropriate parameters in a structural HRTF model. However, an initial step of decomposing measured HRTFs in order to reveal the parameters of the structural model must be performed. A new approach for the decomposition of HRTFs is suggested and evaluated on simulated examples. The potential of this method for the decomposition of measured HRTFs is discussed.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icassp.1998.679630
Head-related transfer function modeling in 3-D sound systems with genetic algorithms
  • May 12, 1998
  • Ngai-Man Cheung + 2 more

Head-related transfer functions (HRTFs) describe the spectral filtering that occurs between a source sound and the listener's eardrum. Since HRTFs vary as a function of the relative source location and subject, practical implementation of 3D audio must take into account a large set of HRTFs for different azimuths and elevations. Previous work has proposed several HRTF models for data reduction. This paper describes our work in applying genetic algorithms to find a set of HRTF basis spectra, and the normal equation method to compute the optimal combination of linear weights to represent the individual HRTFs at different azimuths and elevations. The genetic algorithm selects the basis spectra from the set of original HRTF amplitude responses, using an average relative spectral error as the fitness function. Encouraging results from the experiments suggest that genetic algorithms provide an effective approach to this data reduction problem.

  • Research Article
  • Cite Count Icon 12
  • 10.1108/ijpcc-06-2014-0035
Synthetic individual binaural audio delivery by pinna image processing
  • Aug 26, 2014
  • International Journal of Pervasive Computing and Communications
  • Simone Spagnol + 3 more

Purpose – The purpose of this paper is to present a system for customized binaural audio delivery based on the extraction of relevant features from a 2-D representation of the listener’s pinna. Design/methodology/approach – The most significant pinna contours are extracted by means of multi-flash imaging, and they provide values for the parameters of a structural head-related transfer function (HRTF) model. The HRTF model spatializes a given sound file according to the listener’s head orientation, tracked by sensor-equipped headphones, with respect to the virtual sound source. Findings – A preliminary localization test shows that the model is able to statically render the elevation of a virtual sound source better than non-individual HRTFs. Research limitations/implications – Results encourage a deeper analysis of the psychoacoustic impact that the individualized HRTF model has on perceived elevation of virtual sound sources. Practical implications – The model has low complexity and is suitable for implementation on mobile devices. The resulting hardware/software package will hopefully allow an easy and low-tech fruition of custom spatial audio to any user. Originality/value – The authors show that custom binaural audio can be successfully deployed without the need of cumbersome subjective measurements.

  • Conference Article
  • Cite Count Icon 5
  • 10.1121/1.4799575
The role of spatial detail in sound-source localization: Impact on HRTF modeling and personalization.
  • Jan 1, 2013
  • Griffin D Romigh + 3 more

While current head-related transfer function (HRTF) personalization methods offer some ability to quickly customize spatial auditory displays, these techniques generally lack the realism and performance provided by full individualized HRTF measurements. This poor performance is likely due to the vast amount of individual spectral and spatial variation contained in a measured HRTF. While some of this variation contains important directional information, Kulkarni and Colburn (1998) showed that perceptually irrelevant spectral variation could be eliminated by smoothing the HRTF magnitude with a truncated Fourier series expansion. The present study investigates a related method for smoothing the spatial variation contained in an HRTF magnitude by utilizing a truncated spherical harmonic expansion. The perceptual impacts of various degrees of spatial smoothing were evaluated by comparing performance to performance obtained with full individualized HRTF measurements in a virtual localization task. Results indicate that comparable localization performance can be achieved with as low as a fourth-order spherical harmonic representation, which provides a significant amount of spatial smoothing. Analysis of the resulting simplified representation also uncovered a number of interesting relationships across individuals that may facilitate the development of future techniques that estimate and personalize HRTFs.

  • Research Article
  • 10.1121/1.4806291
The role of spatial detail in sound-source localization: Impact on head-related transfer function modeling and personalization
  • May 1, 2013
  • The Journal of the Acoustical Society of America
  • Griffin D Romigh + 3 more

Current techniques designed to personalize generic head-related transfer functions (HRTFs) have some capacity to quickly customize spatial auditory displays, but these techniques generally fall short of the level of realism and performance provided by fully individualized HRTF measurements. This residual performance deficit reflects inaccuracies due to vast amounts of spatial and spectral variation that occurs across the measured HRTFs of individual listeners. Some of this variation encodes perceptually-important directional information, but a substantial proportion does not. Kulkarni and Colburn (1998) showed that perceptually irrelevant spectral variation could be eliminated by smoothing the HRTF magnitude with a truncated Fourier-series expansion. The present study investigates a related method for smoothing the spatial variation contained in the HRTF by utilizing a truncated spherical harmonic expansion. The impact of spatial smoothing was evaluated by comparing localization performance with individualized HRTFs which were fully represented or had various degrees of spatial smoothing. Results indicate that a highly-smoothed fourth-order spherical harmonic representation can produce localization accuracy comparable to that of a full individualized HRTF. Analysis of the resulting simplified HRTF representations also uncovered a number of interesting relationships across different individuals which may provide new insights for the development of future HRTF personalization and estimation techniques.

  • Research Article
  • Cite Count Icon 3
  • 10.5050/ksnvn.2008.18.6.642
주성분 분석법을 이용한 머리전달함수 모형화 기법의 성능 비교
  • Jun 20, 2008
  • Transactions of the Korean Society for Noise and Vibration Engineering
  • Sungmok Hwang + 2 more

이 연구는 중앙면 상에서 주성분 분석법을 이용하여 시간 및 주파수 영역에서 머리전달함수의 모형화 기법들을 다룬다. 시간영역의 머리전달함수, 복소수 값의 머리전달함수, 확장된 머리전달함수, 로그 크기의 머리전달함수에 기반하여 각각 주성분 분석법을 수행하여 얻은 네 가지 머리전달함수 모형들에 대해서 최소자승오차 관점에서 모형화 성능을 비교하고, 모형들간의 이론적인 관계를 살펴보는 것이 이 연구의 목적이다. 모형화에 사용되는 기저함수의 수가 동일하다면, 시간영역에서의 머리전달함수 혹은 확장된 머리전달함수에 기반한 모형이 복소수 값의 머리전달함수에 기반한 모형보다 최소자승오차 관점에서 더 효율적인 모형화 성능을 지닌다. 시간영역에서의 머리전달함수에 기반한 모형과 확장된 머리전달함수에 기반한 모형은 이론적으로 동일한 모형이며 서로 푸리에 변환 관계가 있다. 로그 크기의 머리전달함수에 기반한 모형은 다른 모형들과 모형화 성능 및 이론적인 관계를 비교할 수가 없는데, 이는 로그 크기의 머리전달함수에 기반한 모형은 머리전달함수의 크기 정보만을 로그 크기로 다루는 반면에 다른 모형들은 선형 크기로 머리전달함수의 크기와 위상정보를 모두 다루기 때문이다. This study deals with modeling of head-related transfer functions(HRTFs) using principal components analysis(PCA) in the time and frequency domains. Four PCA models based on head-related impulse responses(HRIRs), complex-valued HRTFs, augmented HRTFs, and log-magnitudes of HRTFs are investigated. The objective of this study is to compare modeling performances of the PCA models in the least-squares sense and to show the theoretical relationship between the PCA models. In terms of the number of principal components needed for modeling, the PCA model based on HRIR or augmented HRTFs showed more efficient modeling performance than the PCA model based on complex-valued HRTFs. The PCA model based on HRIRs in the time domain and that based on augmented HRTFs in the frequency domain are shown to be theoretically equivalent. Modeling performance of the PCA model based on log-magnitudes of HRTFs cannot be compared with that of other PCA models because the PCA model deals with log-scaled magnitude components only, whereas the other PCA models consider both magnitude and phase components in linear scale.

  • Research Article
  • Cite Count Icon 70
  • 10.1109/89.748123
Common-acoustical-pole and zero modeling of head-related transfer functions
  • Mar 1, 1999
  • IEEE Transactions on Speech and Audio Processing
  • Y Haneda + 3 more

Use of a common-acoustical-pole and zero model is proposed for modeling head-related transfer functions (HRTFs) for various directions of sound incidence. The HRTFs are expressed using the common acoustical poles, which do not depend on the source directions, and the zeros, which do. The common acoustical poles are estimated as they are common to HRTFs for various source directions; the estimated values of the poles agree well with the resonance frequencies of the ear canal. Because this model uses only the zeros to express the HRTF variations due to changes in source direction, it requires fewer parameters (the order of the zeros) that depend on the source direction than do the conventional all-zero or pole/zero models. Furthermore, the proposed model can extract the zeros that are missed in the conventional models because of pole-zero cancellation. As a result, the directional dependence of the zeros can be traced well. Analysis of the zeros for HRTFs on the horizontal plane showed that the nonminimum-phase zero variation was well formulated using a simple pinna-reflection model. The common-acoustical-pole and zero (CAPZ) model is thus effective for modeling and analyzing HRTF's.

  • Research Article
  • Cite Count Icon 2
  • 10.1115/1.2203337
Numerical Modeling of Head-Related Transfer Functions Using the Boundary Source Representation
  • Apr 4, 2006
  • Journal of Vibration and Acoustics
  • Mingsian R Bai + 1 more

A technique based on the virtual source representation is presented for modeling head-related transfer functions (HRTFs). This method is motivated by the theory of simple layer potential and the principle of wave superposition. Using the virtual source representation, the HRTFs for a human head with pinnae are calculated with a minimal amount of computation. In the process, a special regularization scheme is required to calculate the equivalent strengths of virtual sources. To justify the proposed method, tests were carried out to compare the virtual source method with the boundary element method (BEM) and a direct HRTF measurement. The HRTFs obtained using the virtual source method agrees reasonably well in terms of frequency response, directional response, and impulse response with the other methods. From the numerical perspectives, the virtual source method obviates the singularity problem as commonly encountered in the BEM, and is less computationally demanding than the BEM in terms of computational time and memory storage. Subjective experiments are also conducted using the calculated and the measured HRTFs. The results reveal that the spatial characteristics of sound localization are satisfactorily reproduced as a human listener would naturally perceive by using the virtual source HRTFs.

  • Conference Article
  • Cite Count Icon 14
  • 10.1109/acssc.1999.832423
Modeling of head related transfer functions for immersive audio using a state-space approach
  • Oct 24, 1999
  • P Georgiou + 1 more

Accurate localization of sound in 3-D space is based on variations in the spectrum of sound sources. These variations arise mainly from reflection and diffraction effects caused by the pinnae and are described through a set of head-related transfer functions (HRTFs) that are unique for each azimuth and elevation angle. A virtual sound source can be rendered in the desired location by filtering with the corresponding HRTF for each ear. Previous work on HRTF modeling has mainly focused on methods that attempt to model each transfer function individually. These methods are generally computationally-complex and cannot be used for real-time spatial rendering of multiple moving sources. We provide an alternative approach, which uses a multiple-input single-output state-space system to create a combined model of the HRTFs for all directions. This method exploits the similarities among the different HRTFs to achieve a significant reduction in the model size with a minimum loss of accuracy.

  • Research Article
  • Cite Count Icon 5
  • 10.1121/1.421691
Measuring and modeling the effect of source distance in head-related transfer functions
  • May 1, 1998
  • The Journal of the Acoustical Society of America
  • Jyri Huopaniemi + 1 more

Efficient modeling of human spatial hearing by digital filter approximations of head-related transfer functions (HRTFs) is the key technology in 3-D sound processing. It is well known that the HRTF bears the major static localization cues, the interaural time difference (ITD), and the interaural level difference (ILD) that are functions of frequency and the incident angle of arrival. The effect of source distance has, however, often been neglected in HRTF models. In this paper, a method for efficient distance-dependent HRTF modeling is presented, which is based on both theoretical and empirical data. HRTF measurements on eight human subjects and one dummy head were carried out at two source distances, 2 and 0.65 m. It has been argued in the literature that the distance changes mainly affect the ILD, whereas the ITD remains approximately constant. Based on this finding, which was also supported by the measurements performed in this study, a filter structure that models the ILD change as a function of distance was derived. The results of this study are applicable to many near-field listening applications.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/iwaenc.2018.8521349
A Parametric Method for Elevation Control
  • Sep 1, 2018
  • Dingding Yao + 3 more

Elevation perception plays a crucial role in binaural audio rendering. In order to synthesize an immersive and natural sound, the binaural room impulse response(BRIR) is widely applied. However, because of the limitation of recording equipment, most of the measured BRIR databases don't have elevation angle. In this paper, we propose a parametric method for elevation control by modeling and modifying the spectral features of the direct components in BRIRs that play dominant roles in elevation perception. We first decompose and extract the significant spectral features contained in the Head Related Transfer Function (HRTF) spectrum that are perceptually important for vertical localization. Based on these features, the direct component of different BRIRs can be modified, with which the elevated entire BRIRs can be generated by combining the late components of original BRIRs. Listening tests reveal that the proposed method can effectively control the elevation perception of the given BRIR database.

  • Research Article
  • Cite Count Icon 7
  • 10.1121/10.0016854
Modeling individual head-related transfer functions from sparse measurements using a convolutional neural network.
  • Jan 1, 2023
  • The Journal of the Acoustical Society of America
  • Ziran Jiang + 4 more

Individual head-related transfer functions (HRTFs) are usually measured with high spatial resolution or modeled with anthropometric parameters. This study proposed an HRTF individualization method using only spatially sparse measurements using a convolutional neural network (CNN). The HRTFs were represented by two-dimensional images, in which the horizontal and vertical ordinates indicated direction and frequency, respectively. The CNN was trained by using the HRTF images measured at specific sparse directions as input and using the corresponding images with a high spatial resolution as output in a prior HRTF database. The HRTFs of a new subject can be recovered by the trained CNN with the sparsely measured HRTFs. Objective experiments showed that, when using 23 directions to recover individual HRTFs at 1250 directions, the spectral distortion (SD) is around 4.4 dB; when using 105 directions, the SD reduced to around 3.8 dB. Subjective experiments showed that the individualized HRTFs recovered from 105 directions had smaller discrimination proportion than the baseline method and were perceptually undistinguishable in many directions. This method combines the spectral and spatial characteristics of HRTF for individualization, which has potential for improving virtual reality experience.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.