Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion

Yaogen Yang,Haozhe Zhang,Zexin Cai,Yao Shi,Ming Li,Dong Zhang,Xiaojun Ding,Jianhua Deng,Jie Wang

doi:10.1016/j.bspc.2022.104279

Abstract

An electrolarynx (EL) is a medical device that generates speech for people who lost their biological larynx. However, EL speech signals are unnatural and unintelligible due to the monotonous pitch and the mechanical excitation of the EL device. This paper proposes an end-to-end voice conversion method to enhance EL speech. We adopt a speaker-independent automatic speech recognition model to extract bottleneck features as the intermediate phonetic features for enhancement. Our system includes two stages: the bottleneck feature vectors of the EL speech are mapped by a parallel non-autoregressive model to the corresponding feature vectors of the normal speech in stage one. Then another voice conversion model maps normal speech’s bottleneck feature vectors directly to normal speech’s Mel-spectrogram in stage two, followed by a MelGAN-based vocoder to convert the Mel-spectrogram into waveform. In addition, we incorporate data augmentation and transfer learning to improve conversion performance. Experimental results show that the proposed method outperforms our baseline methods and performs well in terms of naturalness and intelligibility. The audio samples are available online.22https://haydencaffrey.github.io/el/index.html.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion

Abstract

Talk to us

Similar Papers

More From: Biomedical Signal Processing and Control

Lead the way for us

Journal: Biomedical Signal Processing and Control	Publication Date: Oct 13, 2022
Citations: 4

Similar Papers

Impact of Aberrant Acoustic Properties on the Perception of Sound Quality in Electrolarynx Speech
Geoffrey S Meltzner ... Robert E Hillman
Journal of Speech, Language, and Hearing Research | VOL. 48
Geoffrey S Meltzner, et. al.Geoffrey S Meltzner ... Robert E Hillman
01 Aug 2005
Journal of Speech, Language, and Hearing Research | VOL. 48

Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN
Kazuhiro Kobayashi ... Tomoki Toda
-
Kazuhiro Kobayashi, et. al.Kazuhiro Kobayashi ... Tomoki Toda
01 Sep 2018
01 Sep 2018

Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech
Keigo Nakamura ... Kiyohiro Shikano
Speech Communication | VOL. 54
Keigo Nakamura, et. al.Keigo Nakamura ... Kiyohiro Shikano
26 Jul 2011
Speech Communication | VOL. 54

Analysis by synthesis of electrolarynx speech
Yoko Saikachi ... Kenneth Stevens
The Journal of the Acoustical Society of America | VOL. 118
Yoko Saikachi, et. al.Yoko Saikachi ... Kenneth Stevens
01 Sep 2005
The Journal of the Acoustical Society of America | VOL. 118

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion

Abstract

Talk to us

Similar Papers

More From: Biomedical Signal Processing and Control