Abstract
In this paper, we propose a low-latency speech enhancement technique for electrolaryngeal (EL) speech based on multi-task CLDNN. Although the EL speech can generate relatively intelligible speech, laryngectomees always suffer quality degradation of speech naturalness due to the mechanical excitation signals. To solve this problem, an EL speech enhancement technique based on CLDNN consisting of convolution, recurrent, and fully connected layers has been proposed. In this technique, an input feature vector of the EL speech is converted into several vocoder parameters such as excitation parameters and spectral parameters based on expert CLDNNs optimized for each feature. However, it is difficult to utilize speech communication because its bi-directional recurrent layers cause a large delay to wait for the end of the utterance. To address this issue, in this paper, we propose multi-task CLDNN with uni-directional recurrent layers for the low-latency EL speech enhancement. Moreover, to achieve comparable performance to the bi-directional CLDNN, we also propose the following techniques: 1) knowledge distillation, 2) data augmentation, and 3) phonetic regularization. The experimental results demonstrate that the proposed method makes it possible to achieve comparable objective results to the bi-directional CLDNN and outperform naturalness and speech intelligibility in the noisy condition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.