Abstract

In this work, a binaural model resembling the human auditory system was built using a pair of three-dimensional (3D)-printed ears to localize a sound source in both vertical and horizontal directions. An analysis on the proposed model was firstly conducted to study the correlations between the spatial auditory cues and the 3D polar coordinate of the source. Apart from the estimation techniques via interaural and spectral cues, the property from the combined direct and reverberant energy decay curve is also introduced as part of the localization strategy. The preliminary analysis reveals that the latter provides a much more accurate distance estimation when compared to approximations via sound pressure level approach, but is alone not sufficient to disambiguate the front-rear confusions. For vertical localization, it is also shown that the elevation angle can be robustly encoded through the spectral notches. By analysing the strengths and shortcomings of each estimation method, a new algorithm is formulated to localize the sound source which is also further improved by cross-correlating the interaural and spectral cues. The proposed technique has been validated via a series of experiments where the sound source was randomly placed at 30 different locations in an outdoor environment up to a distance of 19 m. Based on the experimental and numerical evaluations, the localization performance has been significantly improved with an average error of 0.5 m from the distance estimation and a considerable reduction of total ambiguous points to 3.3%.

Highlights

  • With regard to the frequency response, it can be seen that the amplitude on the azimuth and elevation planes change significantly enough that it is distinguishable from other coordinates

  • In this analysis where the setup was done outdoors, the sound source was initially placed at the front of the device under test (DUT) on the azimuth plane (i.e., θ = 0◦, φ = 0◦ ), and data was captured when it was located at varying distances ranging from 1 m to 19 m

  • DRT60, which refers to the time for the combined direct and reverberant energy level to decay by 60 dB, the perceived signal was firstly band-passed to the desired frequency range of 2–4 kHz

Read more

Summary

Background

In the field of acoustics and robotics, it would require a minimum of three microphones to triangulate a sound source in a two-dimensional (2D) space [1,2]. To enhance the localization and disambiguation performance while retaining the binaural hearing technique and structure, a number of recent works have proposed using active ears, which is inspired by animals, such as bats, which are able to change the shape of their pinnae [26,27,28]. In this regard, the ears act as actuators that can induce dynamic binaural cues for a better prediction. The proposed technique in this study has been validated via a series of experiments where the sound source was randomly placed at 30 different locations with the distance between the source and receiver of up to 19 m

Sound Ambiguity
Direct and Reverberant Energy Fields
Ambiguity Elimination and Distance Estimation
Binaural Localization Strategy
Experiments and Performance Evaluations
Findings
Datasets for the in Section
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call