Abstract

Epochs present in the voiced speech are defined as time instants of significant excitation of the vocal tract system during the production of speech. Nonstationary nature of excitation source and vocal tract system makes accurate identification of epochs a difficult task. Most of the existing methods for epoch detection require prior knowledge of voiced regions and a rough estimation of pitch frequency. In this paper, we propose a novel method that relies on time-order representation (TOR) based on short-time Fourier–Bessel (FB) series expansion which can be employed on entire speech signal to detect epochs without any prior information. The proposed method automatically detects voiced regions in the speech signal by computing the marginal energy density with respect to time in the low frequency range (LFR) from the energy distribution in the time-frequency plane. An estimate of pitch frequency for each detected voiced region is then obtained by computing the marginal energy density with respect to frequency in the LFR from the energy distribution in the time-frequency plane. Epochs are located for each detected voiced region as peaks in the derivative of the low pass filtered (LPF) signal corresponding to falling edges of peak negative cycles in the LPF signal synthesized from TOR coefficients corresponding to LFR. Experimental results obtained by the proposed method on speech signals taken from the CMU-Arctic database are found to be promising. The proposed method detects epochs with high accuracy and reliability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call