Abstract

This paper presents a method for interpolation of lost speech segments. The interpolation method can be used for packet loss concealment in voice communication over mobile phones, for voice over IP or for restoration of lost segments in speech recordings. The interpolation method employs a combination of a linear prediction (LP) model of the spectral envelope and a harmonic noise model (HNM) of the excitation of speech. The speech interpolation problem is transformed to the modeling and interpolation of the trajectories of LP parameters and the amplitude, phase and harmonicity of HNM tracks of speech excitation. In particular, the interpolation of harmonicity results in a smooth transition from voiced to unvoiced speech and vice versa. Crucially, the proposed interpolation method does not suffer from the consequences of zero-excitation of conventional autoregressive (AR) interpolation. Different combinations of linear and autoregressive interpolation methods are evaluated for the estimation of the time-varying parameters of LP-HNM tracks. Furthermore, a post-processing codebook mapping, employed to enhance the interpolation of the spectral envelope of speech, results in improved output quality for longer length speech gaps. For different packet loss rates and patterns of distributions of missing speech gaps, the proposed interpolation methods are evaluated and compared with popular AR-based interpolation methods and the speech packet recovery method specified in the ITU G.711 standard, as a reference. The evaluation results show that the proposed methods substantially improve the restoration of formants and harmonic tracks and consistently results in significant performance gain and improved perceptual quality of speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call