Abstract

This paper addresses a speech coder which uses a text-to-speech (TTS) synthesis system to achieve very low bit rates (sub 1 kbps). The main issue of the work is the accurate coding of the pitch (f/sub 0/) and gain contours which are principle components of prosody. This is of paramount interest since the correct prosody will increase naturalness and an efficient coding scheme will provide high coding gain. Together with the phonetic transcription, the f/sub 0/ and gain contour constitute the parameters that are necessary for the TTS system to synthesize the speech signal. Piecewise linear approximation is used to code the f/sub 0/ parameter. A technique which minimizes the bit rate while maintaining f/sub 0/ error below a given threshold are described. To obtain both high compression and smoothly changing gain contours, the variance of the signal is averaged over each half phoneme length is transmitted as gain information. With single speaker stimuli, and a priori text transcription information, we obtained natural sounding speech at an average bit rate of about 300 bps.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.