Perceptual long-term harmonic plus noise modeling for speech data compression

Faten Ben Ali,Sonia Djaziri-Larbi

doi:10.1109/globalsip.2015.7418423

Abstract

The harmonic plus noise model (HNM) is widely used for the modeling of audio signals. In this paper, we introduce perceptual frequency masking to the 2-band HNM, developed by Stylianou et al., applied to speech signals. An auditory model is used to recognize inaudible sinusoids, which will be removed from the set of model's parameters in order to reduce the data size for speech coding. The proposed perceptual HNM was applied to a large speech database from TIMIT and HINT and has proved to achieve an important (up to 50% in short term frames) parameters-rate compression, yielding a significant data-rates reduction for the long-term (LT) HNM model. The latter is based on LT trajectory modeling of the Short-Term (ST) HNM parameters. Objective and subjective quality evaluation shows that the perceptual HNM introduces no additional distortion compared to the generic 2-band HNM.

Full Text