Abstract

A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved.

Highlights

  • Whispered speech is a method of articulation different from normal speech [1]; it is produced without vibration of the vocal cords at a low sound level, which causes the voiced sound of whispered speech to have no fundamental frequency and an energy 20 dB less than that of normal speech [2]

  • We report a method for converting whispered speech to normal speech based on Mel frequency cepstral coefficient (MFCC) and Gaussian mixture model (GMM)

  • To consider the sparseness of speech, we proposed to use the L1/2 algorithm to invert the MFCC features, which generates a good hearing effect

Read more

Summary

Introduction

Whispered speech is a method of articulation different from normal speech [1]; it is produced without vibration of the vocal cords at a low sound level, which causes the voiced sound of whispered speech to have no fundamental frequency and an energy 20 dB less than that of normal speech [2]. Because of these characteristics, whispered speech is widely used in places where loud noises are prohibited such as conference rooms, libraries, and concert halls.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call