Abstract
This research work proposes an image analysis-based algorithm to enhance the time–frequency (T–F) mask obtained in the initial segmentation of CASA-based monaural speech separation system to improve speech quality and intelligibility. It consists of labelling the initial segmentation mask, boundary extraction, active pixel detection and eliminating the non-active pixels related to noise. In labelling, the T–F mask obtained is labelled as periodicity pixel ( P ) matrix and non-periodicity pixel ( NP ) matrix. Next speech boundaries are created by connecting all the possible nearby P and NP matrix. Some speech boundary may include noisy T–F units as holes; these holes are treated using the proposed algorithm. The proposed algorithm is evaluated with the quality and intelligibility measures such as signal to noise ratio (SNR), perceptual evaluation of speech quality, P EL , P NR , coherence speech intelligibility index (CSII), normalised covariance metric (NCM), and short-time objective intelligibility (STOI). The experimental results show that the proposed algorithm improves the speech quality by increasing the SNR with an average value of 9.91 dB and reduces the P NR by an average value of 25.6% and also improves the speech intelligibility in terms of CSII, NCM, and STOI when compared with the input noisy speech mixture
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.