Abstract
This paper addresses the issue of data compression in distributed speech recognition on the basis of a variable frame rate and length analysis method. The method first conducts frame selection by using a posteriori signal-to-noise ratio weighted energy distance to find the right time resolution at the signal level, and then increases the length of the selected frame according to the number of non-selected preceding frames to find the right time-frequency resolution at the frame level. It produces high frame rate and small frame length in rapidly changing regions and low frame rate and large frame length for steady regions. The method is applied to scalable source coding in distributed speech recognition where the target bitrate is met by adjusting the frame rate. Speech recognition results show that the proposed approach outperforms other compression methods in terms of recognition accuracy for noisy speech while achieving higher compression rates.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have