Abstract

An extremely low bit rate speech coder based on a recognition/synthesis paradigm is proposed. In our speech coder, the speech signal is produced in a way which is similar to concatenative speech synthesis of text-to-speech (TTS). Hence, database construction, unit selection and prosody modification, which are the major parts of concatenative TTS, are employed to implement the speech coder. The synthesis units are automatically found in a large database using a joint segmentation/classification scheme. Dynamic programming (DP) is applied to unit selection in which two cost functions, an acoustic target cost and a concatenation cost are used to increase naturalness as well as intelligibility. Prosodic differences between the selected unit and the input segment are compensated for by time-scale and pitch modifications which are based on the harmonic plus noise (HNM) model framework. In single speaker tests, the proposed scheme gave intelligible and natural sounding speech at an average bit rate of about 580 b/s.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.