Abstract

Time-frequency voiced and unvoiced models are proposed for the excitation of a harmonic autoregressive wideband speech analysis-synthesis system. The time-frequency voiced excitation (TFVEX) model has low time resolution defined by the concentration of the excitation signal distribution in the modulation domain while the time-frequency unvoiced excitation (TFUNEX) model has cycle time discrimination with lower amplitude resolution and while the frequency resolution for both models is an octave. The speech reconstructed by the compound TFUVEX unvoiced-voiced model is rated above the speech degraded by a modulated noise reference unit (MNRU) at 25 dB in listening tests while yielding a parametric compression of over ten times.

Highlights

  • H ARMONIC speech representations have been used for coding at medium to low bit rates [1] even when they fail to achieve perfect reconstruction

  • Each signal was synthesized by analysis-synthesis system (ASyS) in seven test conditions: the analyzed Voiced Part; the modeled Voiced Part plus the analyzed Unvoiced Part – TFVEX; the modeled Voiced Part singled out – TFVEX Alone; the analyzed Voiced Part plus the modeled Unvoiced Part – time-frequency unvoiced excitation (TFUNEX); the analyzed Voiced Part plus the modeled Unvoiced Part with pitch-synchronous spectral weighting (PSSW) emphasis – TFUNEX-PSSW; the modeled Voiced Part combined with the modeled Unvoiced Part – time-frequency unvoiced-voiced excitation (TFUVEX); and the modeled Voiced Part plus the modeled Unvoiced Part with PSSW – TFUVEXPSSW

  • All single-model conditions are scored above 25 dB modulated noise reference unit (MNRU) and the only two-model condition which rises above this level is TFUNEX-PSSW, underlining the distinctive contribution of the unvoiced model to higher fidelity

Read more

Summary

INTRODUCTION

H ARMONIC speech representations have been used for coding at medium to low bit rates [1] even when they fail to achieve perfect reconstruction. The classification of speech segments into voiced and unvoiced classes is important for speech modification and speech coding since they are processed differently. In this work a voiced model is proposed for an NPR frontend representation [9] and this model fits into a framework that includes the unvoiced model previously proposed [10] so that sparse speech representations may be achieved. Unlike usual harmonic representations which apply hard decision for voiced and unvoiced speech classification in the time and/or in the frequency domain, the voiced-unvoiced decision.

SPEECH EXCITATION CLASSIFICATION AND
P0 cos
THE TIME-FREQUENCY VOICED EXCITATION MODEL
THE TIME-FREQUENCY UNVOICED EXCITATION MODEL
THE COMPOUND TIME-FREQUENCY UNVOICED-VOICED
EXPERIMENTAL RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call