Abstract

A decision‐tree‐based quantization scheme for a very low bit rate speech coder based on HMMs is described. The encoder carries out HMM‐based phoneme recognition and then recognized phonemes, state durations, and F0 sequence are quantized, Huffman coded, and transmitted. In the decoder, sequences of mel‐cepstral coefficient vectors and F0’s are generated from the concatenated HMM‐using the HMM‐based speech synthesis technique. Finally, a speech waveform is synthesized by the MLSA filter using the generated mel‐cepstral coefficient and F0 sequences. In the previous system, we train an MSD‐VQ codebook for each phoneme for F0 quantization. Although this scheme can quantize F0 sequences efficiently, to achieve a better speech quality, larger codebook sizes are required. It leads to an increase in the bit rate of the system. To avoid this problem, we cluster F0 sequences using phonetic decision trees and then train a codebook for each leaf node. In the encoding and decoding, codebooks to be used can be determined by tracing the decision tree. It allows us to use smaller codebook sizes since the number of codebooks can be augmented without increase in bit rate. A subjective listening test result shows that the proposed scheme improves the quality of coded speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.