Abstract

The behavior of time delay estimation (TDE) is well understood and therefore attractive to apply in acoustic source localization (ASL). A time delay between microphones maps into a hyperbola. Furthermore, the likelihoods for different time delays are mapped into a set of weighted nonoverlapping hyperbolae in the spatial domain. Combining TDE functions from several microphone pairs results in a spatial likelihood function (SLF) which is a combination of sets of weighted hyperbolae. Traditionally, the maximum SLF point is considered as the source location but is corrupted by reverberation and noise. Particle filters utilize past source information to improve localization performance in such environments. However, uncertainty exists on how to combine the TDE functions. Results from simulated dialogues in various conditions favor TDE combination using intersection-based methods over union. The real-data dialogue results agree with the simulations, showing a 45% RMSE reduction when choosing the intersection over union of TDE functions.

Highlights

  • Passive acoustic source localization (ASL) methods are attractive for surveillance applications, which are a constant topic of interest

  • In [28], the fact that a time delay is inverse-mapped into multiple spatial coordinates was utilized to reduce the number of spatial likelihood function (SLF) grid evaluations by considering only the neighborhood of the n highest time delay estimation (TDE) function values

  • This article discusses a class of acoustic source localization (ASL) methods based on a two-step approach where first the measurement data is transformed using a time delay estimation (TDE) function and combined to produce the spatial likelihood function (SLF)

Read more

Summary

Introduction

Passive acoustic source localization (ASL) methods are attractive for surveillance applications, which are a constant topic of interest. Another popular application is human interaction analysis in smart rooms with multimodal sensors. Automating the perception of human activities is a popular research topic approached from the aspect of localization. Large databases of smart room recordings are available for system evaluations and development [1]. A typical ASL system consists of several spatially separated microphones. The ASL output is either source direction or location in two- or three-dimensional space, which is achieved by utilizing received signal phase information [2] and/or amplitude [3], and possibly sequential information through tracking [4]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.