Abstract

The major impulse-like excitation in the speech signal is due to abrupt closure of the vocal folds, which takes place at the glottal closure instant (GCI) or epoch in each cycle. GCIs are used in many areas of speech science and technology, such as in prosody modification, voice source analysis, formant extraction and speech synthesis. It is difficult to observe these discontinuities (corresponding to GCIs) in the speech signal because of the superimposed time-varying response of the vocal tract system. This paper examines the phase part of different frequency components of the speech signal to extract epochs. Three analysis methods to decompose the speech signal into different frequency components are considered. These methods are the short-time Fourier transform (STFT), narrow bandpass filtering (NBPF), and single frequency filtering (SFF). The locations of the discontinuities in the speech signal are obtained from the instantaneous frequency (IF) (i.e., the time derivative of the phase) of each of the frequency components. A method for automatic detection of epochs using the amplitude weighted IF is proposed. Performance of the proposed epoch detection method is compared with four state-of-the-art methods in clean and telephone quality speech. The performance of the proposed method is comparable with the performance of the existing epoch detection methods for clean speech but better for telephone quality speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call