Abstract

This paper reports the results of a study of the perceptual consequences of the time and frequency resolution loss inherent in vocoded speech, and an evaluation of an adaptive resolution scheme. A cepstrum vocoder which adapted its time and frequency resolution according to the voiced–unvoiced nature of the input speech was computer simulated. Speech processed by the vocoder was subjectively evaluated and several tentative conclusions regarding time-frequency resolution and speech quality were drawn. The results of the study suggest that (1) 20-msec time resolution is adequate for vocoder applications, (2) adapting to better time resolution in unvoiced regions and regions of voiced–unvoiced and unvoiced–voiced transitions leads to improved speech quality in systems that do not normally maintain 20 msec or better time resolution, (3) frequency resolution may be reduced considerably in unvoiced and transition regions with no noticeable degradation in speech quality, and (4) time-frequency resolution trading may occur in the speech perception process. Subject Classification: 70.40, 70.55, 70.50.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call