Abstract

An electrolarynx is a speaking aid device to artificially generate excitation sounds to help laryngectomees produce electrolaryngeal (EL) speech. Although EL speech is quite intelligible, its naturalness significantly suffers from the unnatural fundamental frequency (F 0 ) patterns of the mechanical excitation sounds. To make it possible to produce more naturally sounding EL speech, we have proposed a method to automatically control F 0 patterns of the excitation sounds generated from the electrolarynx based on the statistical F 0 prediction, which predicts F 0 patterns from the produced EL speech in real-time. In our previous work, we have developed a prototype system by implementing the proposed real-time prediction method in an actual, physical electrolarynx, and through the use of the prototype system, we have found that improvements of the naturalness of EL speech yielded by the prototype system tend to be lower than that yielded by the batch-type prediction. In this paper, we examine negative impacts caused by latency of the real-time prediction on the F 0 prediction accuracy, and to alleviate them, we also propose two methods, 1) modeling of segmented continuous F 0 (CF 0 ) patterns and 2) prediction of forthcoming F 0 values. The experimental results demonstrate that 1) the conventional real-time prediction method needs a large delay to predict CF 0 patterns and 2) the proposed methods have positive impacts on the real-time prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call