Abstract

A method is proposed for time delay estimation (TDE) from mixed source (speaker) signals collected at two spatially separated microphones. The key idea in this proposal is that the crosscorrelation between corresponding segments of the mixed source signals is computed using the outputs of single frequency filtering (SFF) obtained at several frequencies, rather than using the collected waveforms directly. The advantage of the SFF output is that it will have high signal-to-noise ratio regions in both time and frequency domains. Also it gives multiple evidences, one from each of the SFF outputs. These multiple evidences are combined to obtain robustness in the TDE. The estimated time delays can be used to determine the number of speakers present in the mixed signals. The TDE is shown to be robust against different types and levels of degradations. The results are shown for actual mixed signals collected at two spatially separated microphones in a live laboratory environment, where the mixed signals contain speech from several spatially distributed speakers.

Highlights

  • This paper proposes a method of estimating the time delay of a speaker’s speech collected at two spatially separated microphones in a live laboratory environment

  • We propose that the envelopes of the single frequency filtering (SFF) outputs of these two signals at each frequency can be used for computing the crosscorrelation

  • The method is based on SFF analysis of speech signals, which is known to give signal components with high signal-to-noise ratio (SNR) in different regions in the time and frequency domains

Read more

Summary

Introduction

This paper proposes a method of estimating the time delay of a speaker’s speech collected at two spatially separated microphones in a live laboratory environment. It is interesting to note that the percentage of frames (α) from the two prominent peaks in Fig. 2d (79%) is higher than the corresponding values 68, 65 and 70% obtained from Fig. 2a–c, respectively This indicates the advantage of the proposed SFF-based method over other methods, especially the most popular GCC-PHAT method.

Babble Noise at Different Levels
Different Types of Noises at 0 dB SNR
Findings
Summary and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call