Abstract

The successful implementation of speech processing systems in the real world depends on its ability to handle adverse acoustic conditions with undesirable factors such as room reverberation and background noise. In this study, an extension to the established multiple sensors degenerate unmixing estimation technique (MENUET) algorithm for blind source separation is proposed based on the fuzzy c-means clustering to yield improvements in separation ability for underdetermined situations using a nonlinear microphone array. However, rather than test the blind source separation ability solely on reverberant conditions, this paper extends this to include a variety of simulated and real-world noisy environments. Results reported encouraging separation ability and improved perceptual quality of the separated sources for such adverse conditions. Not only does this establish this proposed methodology as a credible improvement to the system, but also implies further applicability in areas such as noise suppression in adverse acoustic environments.

Highlights

  • The ability of the human cognitive system to distinguish between multiple, simultaneously active sources of sound is a remarkable quality that is often taken for granted

  • 4.3 Results 4.3.1 Initial evaluations of multiple sensors degenerate unmixing estimation technique (MENUET) with fuzzy c-means (FCM) Prior to evaluating the effectiveness of the FCM clustering for mask estimation in the MENUET framework, the FCM was evaluated in a simple stereo setup for a variety of feature sets in order to test its feasibility in this context

  • 5 Conclusions This study has presented an extension to the existing MENUET algorithm for underdetermined blind source separation (BSS) in adverse environments

Read more

Summary

Introduction

The ability of the human cognitive system to distinguish between multiple, simultaneously active sources of sound is a remarkable quality that is often taken for granted. This capability has been studied extensively within the speech processing community, and many an endeavor at imitation has been made. Automatic speech processing systems are yet to perform at a level akin to human proficiency [1] and are frequently faced with the quintessential ‘cocktail party problem’: the inadequacy in the processing of the target speaker/s when there are multiple speakers in the scene [2].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call