Abstract
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life. Recently, Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility. Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning. In this paper, features are focused. Multi-resolution power-normalized cepstral coefficients (MRPNCC) are proposed as a new feature to enhance the speech intelligibility for hearing impaired. The new feature is constructed by combining four cepstrum at different time–frequency (T–F) resolutions in order to capture both the local and contextual information. MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine (SVM) classifier, which aim to identify the binary masking values of the T–F units in the enhancement stage. The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit. Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA, STOI, HASPI and PESQ, and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly. Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.