EM-Based TDOA Estimation of a Speech Source via Gaussian Mixture Models in Noisy and Anechoic Environments

Zhihua Lu,Joao P J Da Costa,Tai Fei

doi:10.1109/access.2021.3119749

Zhihua Lu, Joao P J Da Costa + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3119749

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 1	License type: CC BY 4.0

Affiliation: Ningbo University, Hella (Germany)

Abstract

The propagation delay difference of a speech signal transmitted from the source to microphones, also known as time difference of arrival (TDOA), embodies the information of speech source position. The TDOA estimation plays a vital role in diverse systems such as teleconferencing and far-field speech recognition since the TDOA is a key parameter impacting quality of restored speech signals. This paper is devoted to estimating the TDOA of one speech source on a frame by frame basis in noisy and anechoic environments. First, we propose two variants of Gaussian mixture model to represent the speech signal received by a microphone pair, assuming Gaussianity of the signal and modeling speech sparsity by the speech presence probability (SPP). Second, after estimating the noise parameter in advance and formulating the speech parameters using the maximum likelihood principle, the proposed Gaussian mixture models are reduced to being dependent only on two unknowns, i.e. TDOA and SPP. Third, following these two models, we present two distinct estimators to estimate the TDOA and the SPP iteratively based on the expectation maximization algorithm. The proposed two estimators are free from the ad hoc parameter selection which is required in many classical approaches. Simulation results show that the TDOA estimated by them could be more accurate than that of the state-of-the-art GCC variants in a wide range of frames with specific SPP values. More importantly, the automatically estimated SPP which can be served as voice activity detection in a soft manner encodes the information of the TDOA estimation accuracy. In a speech frame, the estimated SPP with a large value indicates the estimated TDOA with small error. For example, when the SPP is larger than 0.76 and 0.87 in the two proposed estimators, respectively, the TDOA estimation error could be at most 19% of that in the worst case.

Highlights

Speech source localization, i.e., determining the spatial position of a speech source, is a fundamental issue in adhoc acoustic sensor networks composed of distributed microphones [1]–[3], and it finds a growing interest in many applications such as teleconferencing [4], far-field speech recognition [5], surveillance [6], and so on
Many source localization approaches have been developed in recent decades, which can be classified into two groups, i.e., spatial spectrum and time-frequency (TF) processing, see Table 1
It results in four time difference of arrival (TDOA) values, that is, 7.0691×10−5 s, −1.4126×10−4 s, 0 s and 6.7436 × 10−4 s

Summary

Introduction

I.e., determining the spatial position of a speech source, is a fundamental issue in adhoc acoustic sensor networks composed of distributed microphones [1]–[3], and it finds a growing interest in many applications such as teleconferencing [4], far-field speech recognition [5], surveillance [6], and so on. The mainstream localization applications focus either on time difference of arrival (TDOA) estimation [7] [8] or direction of arrival (DOA) estimation [9]. Many source localization approaches have been developed in recent decades, which can be classified into two groups, i.e., spatial spectrum and time-frequency (TF) processing, see Table 1. The spatial spectrum approaches construct a spectrum function of the spatial parameters (i.e., TDOA or DOA). The locations of highest peaks of the spectrum function indicate the TDOA (or DOA) candidates. The spectrum function can be constructed by methods like, e.g., the generalized crosscorrelation (GCC) algorithm [10] and subspace-based methods. The GCC function is expressed by inserting a weight

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EM-Based TDOA Estimation of a Speech Source via Gaussian Mixture Models in Noisy and Anechoic Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Time difference of arrival estimation of sound source using Cross Correlation and modified maximum likelihood weighting function
Mir Saber Hosseini ... Yousef Zanjireh
Scientia Iranica | VOL. 0
Mir Saber Hosseini, et. al.Mir Saber Hosseini ... Yousef Zanjireh
30 Aug 2017
Scientia Iranica | VOL. 0

Cramer-Rao Lower Bounds of TDOA and FDOA Estimation Based on Satellite Signals
Mingqian Liu ... Peng Liu
-
Mingqian Liu, et. al.Mingqian Liu ... Peng Liu
01 Aug 2018
01 Aug 2018

Polynomial Fitting and Interpolation Method in TDOA Estimation of Sensors Network
Jianhui Yang ... Wenhao Sun
IEEE Sensors Journal | VOL. 23
Jianhui Yang, et. al.Jianhui Yang ... Wenhao Sun
15 Feb 2023
IEEE Sensors Journal | VOL. 23

Joint TDOA, FDOA and differential Doppler rate estimation: Method and its performance analysis
Dexiu Hu ... Jianhua Lu
Chinese Journal of Aeronautics | VOL. 31
Dexiu Hu, et. al.Dexiu Hu ... Jianhua Lu
14 Nov 2017
Chinese Journal of Aeronautics | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EM-Based TDOA Estimation of a Speech Source via Gaussian Mixture Models in Noisy and Anechoic Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access