Abstract
In this paper, we propose an entropy coding method to further compress quantized mel-frequency cepstral coefficients (MFCCs) extracted for distributed speech recognition (DSR). In the ETSI extended DSR standard, MFCCs are compressed with additional parameters such as pitch and voicing class. It is observed that the distribution of MFCCs varies according to the voicing class, thereby enabling the design of different Huffman trees for MFCCs according to voicing class. Based on this observation, we could further reduce the bit-rates of compressed MFCCs compared to the Huffman coding method that does not consider voicing class. Subsequent experiments show that the bit-rate of the proposed method is 34.18 bits per frame, which is 1.84 bits/frame lower than that of the Huffman coding method without voicing.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have