Abstract

This paper proposes a novel weighting algorithm for Cross-power Spectrum Phase (CSP) analysis to improve the accuracy of direction of arrival (DOA) estimation for beamforming in a noisy environment. Our sound source is a human speaker and the noise is broadband noise in an automobile. The harmonic structures in the human speech spectrum can be used for weighting the CSP analysis, because harmonic bins must contain more speech power than the others and thus give us more reliable information. However, most conventional methods leveraging harmonic structures require pitch estimation with voiced-unvoiced classification, which is not sufficiently accurate in noisy environments. In our new approach, the observed power spectrum is directly converted into weights for the CSP analysis by retaining only the local peaks considered to be harmonic structures. Our experiment showed the proposed approach significantly reduced the errors in localization, and it showed further improvements when used with other weighting algorithms.

Highlights

  • The performance of automatic speech recognition (ASR) is severely affected in noisy environments

  • Our approach is like a continuous converter from an input spectrum to a weight vector, which can be locally large for the bins whose harmonic structures are distinct

  • We previously proposed a method for speech enhancement called Local Peak Enhancement (LPE) to provide robust ASR even in very low SNR conditions due to driving noises from an open window or loud air conditioner noises [17]

Read more

Summary

Introduction

The performance of automatic speech recognition (ASR) is severely affected in noisy environments. Profile Fitting [5] can address the diffraction and reflection with the advantage of reducing the effects of noise sources through localization. Among these methods, CSP analysis is popular because it is accurate, reliable, and simple. The assumption is that a subband with more speech power conveys more reliable information for localization It did not use the harmonic structures of human speech. Because the harmonic bins must contain more speech power than the other bins, they should give us more reliable information in noisy environments. We want a special weighting algorithm that puts larger weights on the bins where the harmonic structures are distinct, without requiring explicit pitch detection and voiced-unvoiced classification

Sound Source Localization Using CSP Analysis
Harmonic Structure-Based Weighting
Experiment
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call