Abstract

Throat Microphone (TM) speech is a narrow bandwidth speech and it sounds unnatural, unlike acoustic microphone (AM) recording. Although the TM captured speech is not affected by the environmental noise but it suffers naturalness and intelligibility problems. In this paper, we focus on the problem of enhancing the perceptual quality of the TM speech using the machine learning technique by modifying the spectral envelope and vocal tract parameters. The Mel-frequency Cepstral Coefficients (MFCCs) feature extraction technique is carried out to extract speech features. Then mapping technique is used between the features of the TM and AM speech using Neural Network. This improves the perceptual quality of the TM speech with respect to AM speech by estimating and correcting the missing high-frequency components in between 4 and 8 kHz from the low-frequency band (0–4 kHz) of TM speech signal. Then the least-square estimation and Inverse Short-time Fourier Transform Magnitude methods are applied to measure the power spectrum is used to reconstruct the speech signal. The ATR503 dataset is used to test the proposed technique. The simulation results show a visible performance in the field of speech enhancement in adverse environments. The aim of this study is for natural human–machine interaction for vocal tract affected people.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call