Abstract

Conceptual simplicity of the classical channel VOCODER provides a powerful means for systematic investigations on perceptual effects of speech related physical parameters when combined with modern computational power and signal processing theories. A modern version of channel VOCODER, STRAIGHT [Kawahara et al., Speech Commun. 27, 187–207 (1999)], which is also an extension to pitch-synchronous analysis and synthesis, generates naturally sounding resynthesized speech from the analyzed smooth time-frequency surface and source parameters such as F0. This high-quality resynthesis enables close investigations on naturalness deterioration as a function of feature modifications in the decomposed parameter domain; for example, detailed shape of a F0 trajectory, underlying parameters to determine F0 trajectory dynamics, group delay alignment of excitation pulses and aperiodicity/periodicity ratio of the excitation source and so on. One of potential advantages of this strategy is based on the fact that our perceptual function is highly nonlinear. The other source of advantage is virtually an independent parameter set which allows precise control of parameter deviations from the original analysis results. An overview of recent findings and modification demonstrations will be presented. [Work supported by CREST grant of Japanese Science and Technology Corporation.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call