Abstract

We introduce a multiengine speech processing system that can detect the location and the type of audio signal in variable noisy environments. This system detects the location of the audio source using a microphone array; the system examines the audio first, determines if it is speech/nonspeech, then estimates the value of the signal to noise (SNR) using a Discrete-Valued SNR Estimator. Using this SNR value, instead of trying to adapt the speech signal to the speech processing system, we adapt the speech processing system to the surrounding environment of the captured speech signal. In this paper, we introduced the Discrete-Valued SNR Estimator and a multiengine classifier, using Multiengine Selection or Multiengine Weighted Fusion. Also we use the SI as example of the speech processing. The Discrete-Valued SNR Estimator achieves an accuracy of 98.4% in characterizing the environment's SNR. Compared to a conventional single engine SI system, the improvement in accuracy was as high as 9.0% and 10.0% for the Multiengine Selection and Multiengine Weighted Fusion, respectively.

Highlights

  • Speech processing systems, such as Speaker identification (SI) and Automatic Speech Recognition (ASR), have two operating modes: a training mode and a testing mode

  • We introduce a speech processing system that adapts to the surrounding environment of the speech signal, rather than trying to adapt the speech signal to the system; in this system, no speech enhancements are applied to the captured speech signal

  • Compared to the accuracy using the SI engine trained in a clean environment, the improvement in accuracy was as high as 30.6%

Read more

Summary

Introduction

Speech processing systems, such as Speaker identification (SI) and Automatic Speech Recognition (ASR), have two operating modes: a training mode and a testing mode. The authors of [5] reported an ASR accuracy enhancement of 5.4% of the proposed combination methods; compared to best performing standalone speech enhancement technique. This combination has been achieved using a noise environment detector with a detection accuracy of 54%. The instrument is capable of identifying the audio type and location of nonspeech audio signals (e.g., footsteps, windows breaking, and cocktail noise) [9] This instrument can be incorporated in a variety of applications, such as handsfree audio conferencing systems to perform voice-based security authentication for the conference users [10].

Proposed System
Experiment and Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.