Abstract

One of the reasons why pitch extraction from speech has been difficult is that pitch is nonstationary and changes over a wide range. In most of the conventional methods, it is assumed that speech remains stationary in an analysis frame of fixed width. This essentially produces an extraction error. In order to remedy this point, there have been attempts to apply the wavelet transform to the pitch extraction, where the resolution can be adjusted dynamically on the time–frequency plane [5, 6]. This paper proposes an implementation of pitch extraction based on discrete wavelet transform, which can execute the computation efficiently by a subband filter bank for pitch filtering. Since the subband filter bank contains internal down-sampling, aliasing in the pitch filtering is a problem. In the proposed method, aliasing is avoided by scaling of the input signal. A computer simulation is tried for the test signal and actual speech. The synthesized test signal includes the first three harmonic components, and the pitch frequency is increased exponentially so that the time variation is faster for higher frequencies. It is verified that the pitch can be extracted with an error rate of about ±2% for any pitch frequency. © 1999 Scripta Technica, Electron Comm Jpn Pt 3, 82(6):36–45, 1999

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call