Abstract

For low-bit-rate coding and synthesis the evolution of spectral parameters is a source of redundancy to be considered. A triangular interpolation spectral measure (TRISM) is proposed as the basis for an open-loop event location criterion for low-delay temporal decomposition (TD). TRISM comes as an improvement in linear interpolation error measurement over the spectral transition measure (STM). While STM is heuristic and presupposes asymmetric event functions, TRISM is a minimum square interpolation error based on symmetric functions. Minimun TRISM (MINTRISM) TD interpolates up to 13 frames between adjacent events at a mean event rate of 15 Hz and interpolation error level equivalent to that of standard low-bit-rate speech coders. The MINTRISM criterion is also a more stable solution to the location of events and determination of their number than previous global and local TD methods.

Highlights

  • T HE representation of speech spectral features plays a central role in speech coding, synthesis and recognition

  • For a given location window length 2M +1, this is equivalent to the determination of frame locations n that locally minimize the triangular interpolation spectral measure (TRISM), defined by TM (n) =

  • Speech spectral envelopes were obtained at a frame rate of 200 Hz as the line spectral frequency (LSF) vector representation that results from tenth-order linear prediction (LP) analysis of a 25 ms segment of speech extracted through a Hamming window

Read more

Summary

INTRODUCTION

T HE representation of speech spectral features plays a central role in speech coding, synthesis and recognition. The line spectral frequency (LSF) coefficients [1], [2], [3] are the representation of choice for the spectral vectors in speech coding since they are very robust parameters against quantization and interpolation errors. In the synthesis or recognition phase, event targets are interpolated by means of event functions in order to reconstruct the parameter tracks. While such algorithms are useful for speech recognition [5], store-and-forward messaging applications [6] and for compressing speech synthesis corpora [7], for two-way coding applications low-delay TD algorithms are necessary. For low-bit-rate speech coding, weighted distortion measures across frame time and frequency[11], [12] should be considered

SPECTRAL MEASURES FOR EVENT LOCATION
LOCAL TEMPORAL DECOMPOSITION
EXPERIMENTS WITH TEMPORAL DECOMPOSITION AND
60 Uniform linear interpolation
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call