Abstract

The ideal binary mask (IBM) has been assigned as a computational goal in computational auditory scene analysis (CASA) algorithms. Only time–frequency (T-F) units with local signal-to-noise ratio (SNR) exceeding a local criterion (LC) are assigned the binary value 1 in the binary mask. However, there are two problems with employing IBM in source separation applications. First, an optimum LC for a certain SNR may not be appropriate for other SNRs. Second, binary weighting may cause some parts or regions of the synthesized speech to be discarded at the output. If one employs variable weights, as opposed to the hard limiting weights (i.e., 0 or 1) taken in IBM, the above-mentioned problems can be solved considerably. In this chapter, a novel auditory-based mask, called ideal multi-threshold mask (IMM) is proposed which can be used in source separation applications. To show the potential capabilities of the new mask, a minimum mean-square error (MMSE)-based method is proposed to estimate IMM in the framework of monaural speech enhancement system. Various objective and subjective evaluation criteria show the superior performance of the new speech enhancement system as compared to a recently introduced enhancement technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.