In linear predictive speech coders, the linear predictive coding (LPC) parameters are usually transformed into the line spectral frequency (LSF) representation for quantization. In this paper, the single and multiple frame coding of LSF parameters using deep neural network (DNN) and pyramid vector quanrizer (PVQ) are proposed. In the single-frame scheme, a non-linear DNN predictor which demonstrates much better prediction performance than the autoregressive (AR) model is applied to exploit the inter-frame dependency of LSF parameters. The prediction residual signal has Laplacian distribution and can be efficiently quantized by the PVQ. The performance evaluation using spectral distortion shows that the proposed DNN predictive PVQ outperforms the AR predictive split vector quantization (SVQ). In the multiple-frame scheme, a deep autoencoder possessing linear coder-layer units with Gaussian noise is used to compress and de-correlate multiple LSF frames. The deep autoencoder shows a high degree of modelling flexibility for multiple LSF frames. To quantize the coder-layer vector effectively, a PVQ is considered. The experimental results show that the proposed multi-frame scheme with determined optimal coder-layer dimension outperforms the discrete cosine model (DCM)-based approach in terms of spectral distortion performance and robustness across different speech segments.
Read full abstract