Abstract

Speech recorded from a throat microphone is robust to the surrounding noise, but sounds unnatural unlike the speech recorded from a close-speaking microphone. This paper addresses the issue of improving the perceptual quality of the throat microphone speech by mapping the speech spectra from the throat microphone to the close-speaking microphone. A neural network model is used to capture the speaker-dependent functional relationship between the feature vectors (cepstral coefficients) of the two speech signals. A method is proposed to ensure the stability of the all-pole synthesis filter. Objective evaluations indicate the effectiveness of the proposed mapping scheme. The advantage of this method is that the model gives a smooth estimate of the spectra of the close-speaking microphone speech. No distortions are perceived in the reconstructed speech. This mapping technique is also used for bandwidth extension of telephone speech.

Highlights

  • Speech signal collected by a vibration pickup placed at the throat is clean, but does not sound natural like a normal microphone speech

  • Two major issues are addressed in the approach proposed in this paper: (a) a suitable mapping technique to capture the functional relationship between the feature vectors of the two types of speech signals, and (b) an approach to ensure that the estimated feature vectors generated by the model result in a stable all-pole filter for synthesis of speech

  • The weighted linear prediction cepstral coefficients (wLPCCs) are derived from the throat microphone (TM) speech and the normal microphone (NM) speech

Read more

Summary

INTRODUCTION

Speech signal collected by a vibration pickup (called throat microphone) placed at the throat (near the glottis) is clean, but does not sound natural like a normal (close-speaking) microphone speech. Two major issues are addressed in the approach proposed in this paper: (a) a suitable mapping technique to capture the functional relationship between the feature vectors of the two types of speech signals, and (b) an approach to ensure that the estimated feature vectors generated by the model result in a stable all-pole filter for synthesis of speech. Neural network approaches that use a simple nonlinear mapping from narrow to wideband speech signal have been exploited to estimate the missing frequency components [9, 10]. The advantage of the proposed method is that no discontinuity is perceived between successive frames of the reconstructed speech This is because the network provides a smooth estimate of the wideband normal spectra.

SPECTRAL CHARACTERISTICS OF TM SPEECH AND NM SPEECH
MAPPING SPECTRAL FEATURES OF TM SPEECH TO NM SPEECH
Features for mapping
Neural network model for mapping spectral features
Experimental results
Bandwidth extension of telephone speech
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.