Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN

Kazuhiro Kobayashi,Tomoki Toda

doi:10.23919/eusipco47968.2020.9287721

Abstract

In this paper, we propose a low-latency speech enhancement technique for electrolaryngeal (EL) speech based on multi-task CLDNN. Although the EL speech can generate relatively intelligible speech, laryngectomees always suffer quality degradation of speech naturalness due to the mechanical excitation signals. To solve this problem, an EL speech enhancement technique based on CLDNN consisting of convolution, recurrent, and fully connected layers has been proposed. In this technique, an input feature vector of the EL speech is converted into several vocoder parameters such as excitation parameters and spectral parameters based on expert CLDNNs optimized for each feature. However, it is difficult to utilize speech communication because its bi-directional recurrent layers cause a large delay to wait for the end of the utterance. To address this issue, in this paper, we propose multi-task CLDNN with uni-directional recurrent layers for the low-latency EL speech enhancement. Moreover, to achieve comparable performance to the bi-directional CLDNN, we also propose the following techniques: 1) knowledge distillation, 2) data augmentation, and 3) phonetic regularization. The experimental results demonstrate that the proposed method makes it possible to achieve comparable objective results to the bi-directional CLDNN and outperform naturalness and speech intelligibility in the noisy condition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion
Kou Tanaka ... Sakriani Sakti
-
Kou Tanaka, et. al.Kou Tanaka ... Sakriani Sakti
25 Aug 2013
25 Aug 2013

Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation
Kou Tanaka ... Tomoki Toda
-
Kou Tanaka, et. al.Kou Tanaka ... Tomoki Toda
14 Sep 2014
14 Sep 2014

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F<inf>0</inf> prediction
Kou Tanaka ... Sakriani Sakti
-
Kou Tanaka, et. al.Kou Tanaka ... Sakriani Sakti
01 Dec 2014
01 Dec 2014

An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement
Kou Tanaka ... Tomoki Toda
-
Kou Tanaka, et. al.Kou Tanaka ... Tomoki Toda
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN

Abstract

Talk to us

Similar Papers