Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN

Kazuhiro Kobayashi,Tomoki Toda

doi:10.23919/eusipco.2018.8553154

Abstract

An electrolarynx (EL) is a widely used device to mechanically generate excitation signals, making it possible for laryngectomees to produce EL speech without vocal fold vibrations. Although EL speech sounds relatively intelligible, is significantly less natural than normal speech owing to its mechanical excitation signals. To address this issue, a statistical voice conversion (VC) technique based on Gaussian mixture models (GMMs) has been applied to EL speech enhancement. In this technique, input EL speech is converted into target normal speech by converting spectral features of the EL speech into spectral and excitation parameters of normal speech using GMMs. Although this technique makes it possible to significantly improve the naturalness of EL speech, the enhanced EL speech is still far from the target normal speech. To improve the performance of statistical EL speech enhancement, in this paper, we propose an EL-to-speech conversion method based on CLDNNs consisting of convolutional layers, long short-term memory recurrent layers, and fully connected deep neural network layers. Three CLDNNs are trained, one to convert EL speech spectral features into spectral and band-aperiodicity parameters, one to convert them into unvoiced/voiced symbols, and one to convert them into continuous $F_{0}$ patterns. The experimental results demonstrate that the proposed method significantly outperforms the conventional method in terms of both objective evaluation metrics and subjective evaluation scores.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Analysis by synthesis of electrolarynx speech
Yoko Saikachi ... Kenneth Stevens
The Journal of the Acoustical Society of America | VOL. 118
Yoko Saikachi, et. al.Yoko Saikachi ... Kenneth Stevens
01 Sep 2005
The Journal of the Acoustical Society of America | VOL. 118

Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech
Geoffrey S Meltzner ... Robert E Hillman
Journal of Speech, Language, and Hearing Research | VOL. 48
Geoffrey S Meltzner, et. al.Geoffrey S Meltzner ... Robert E Hillman
01 Aug 2005
Journal of Speech, Language, and Hearing Research | VOL. 48

Electrolarynx in voice rehabilitation
Hanjun Liu ... Manwa L Ng
Auris Nasus Larynx | VOL. 34
Hanjun Liu, et. al.Hanjun Liu ... Manwa L Ng
18 Jan 2007
Auris Nasus Larynx | VOL. 34

Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion
Yaogen Yang ... Jie Wang
Biomedical Signal Processing and Control | VOL. 80
Yaogen Yang, et. al.Yaogen Yang ... Jie Wang
13 Oct 2022
Biomedical Signal Processing and Control | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN

Abstract

Talk to us

Similar Papers