Fast SOLA-Based Time Scale Modification Using Envelope Matching

Peter H.W Wong

doi:10.1023/a:1023387921411

Abstract

Time scale modification (TSM) of speech and audio signals is very useful in many applications such as MPEG-4 and fast/slow browsing of pre-recorded materials. Synchronized Overlap-and-Add (SOLA) is a time-domain TSM algorithm known to achieve good speech and audio quality. One problem of SOLA is that it requires a large amount of computation in the search of the best matching point between the analysis and synthesis frames. In this paper, we propose two algorithms, envelope-matching TSM (EM-TSM) and modified EM-TSM (MEM-TSM), to simplify the computation with negligible perceptual quality degradation. In EM-TSM, 1-bit sign information is used in the search to substitute the full-precision signal samples used in SOLA. Three additional computation reduction measures, namely simplified formulation, recursive computation and search-point reduction, are applied to achieve significant computation reduction. In MEM-TSM, we reduce the computation of EM-TSM further by introducing zero-crossing point reduction and predictive search skipping. We also improve the quality of the time-scaled signals by introducing multiple-candidate re-examination, and frame-size modification. Simulation results show that the proposed MEM-TSM can achieve computational reduction factors as large as 300 with very good perceptual quality of time-scaled speech and audio.

Full Text