Abstract

Abstract Previously, a dereverberation method based on generalized spectral subtraction (GSS) using multi-channel least mean-squares (MCLMS) has been proposed. The results of speech recognition experiments showed that this method achieved a significant improvement over conventional methods. In this paper, we apply this method to distant-talking (far-field) speaker recognition. However, for far-field speech, the GSS-based dereverberation method using clean speech models degrades the speaker recognition performance. This may be because GSS-based dereverberation causes some distortion between clean speech and dereverberant speech. In this paper, we address this problem by training speaker models using dereverberant speech obtained by suppressing reverberation from arbitrary artificial reverberant speech. Furthermore, we propose an efficient computational method for a combination of the likelihood of dereverberant speech using multiple compensation parameter sets. This addresses the problem of determining optimal compensation parameters for GSS. We report the results of a speaker recognition experiment performed on large-scale far-field speech with different reverberant environments to the training environments. The proposed GSS-based dereverberation method achieves a recognition rate of 92.2%, which compares well with conventional cepstral mean normalization with delay-and-sum beamforming using a clean speech model (49.0%) and a reverberant speech model (88.4%). We also compare the proposed method with another dereverberation technique, multi-step linear prediction-based spectral subtraction (MSLP-GSS). The proposed method achieves a better recognition rate than the 90.6% of MSLP-GSS. The use of multiple compensation parameters further improves the speech recognition performance, giving our approach a recognition rate of 93.6%. We implement this method in a real environment using the optimal compensation parameters estimated from an artificial environment. The results show a recognition rate of 87.8% compared with 72.5% for delay-and-sum beamforming using a reverberant speech model.

Highlights

  • Because of the existence of reverberation in far-field environments, the recognition performance for distanttalking speech/speakers is drastically degraded

  • The proposed method gives an error reduction rate of 35.8% compared with LTLSS and 24.7% compared with multi-step linear prediction (MSLP)-generalized spectral subtraction (GSS)

  • 6 Conclusions Previously, Wang et al proposed a blind dereverberation method based on GSS that employed multi-channel least mean-squares (MCLMS) for handsfree speech recognition [22]

Read more

Summary

Introduction

Because of the existence of reverberation in far-field environments, the recognition performance for distanttalking speech/speakers is drastically degraded. Because of multiple reflections and diffusions of the sound waves, the energy of previous speech is smeared over time, and overlaps with subsequent speech This results in a duration that is much longer than the window size of short-term spectral analysis, a problem known as late reverberation [8]. A blind deconvolution-based approach for restoring speech that has been degraded by the acoustic environment was proposed in [19] This scheme processed the phase-only output from two microphones using cepstrum operations and signal reconstruction theory. In [12], a multi-channel speech dereverberation method based on spectral subtraction using a statistical model to estimate the power spectrum was proposed. The method first estimates late reverberations using long-term multi-step linear prediction, and suppresses them with subsequent spectral subtraction. The drawback of this approach is that the optimum parameters for spectral subtraction are empirically estimated from a development dataset, meaning that the late reverberation cannot be subtracted correctly as it is not precisely modeled

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.