Abstract

In this paper, we propose a method for detection of glottal closure instants (GCI) in the voiced regions of speech signals. The method is based on periodicity of significant excitations of the vocal tract system. The key idea is the computation of coherent covariance sequence, which overcomes the effect of dynamic range of the excitation source signal, while preserving the locations of significant excitations. The Hilbert envelope of linear prediction residual is used as an estimate of the source of excitation of the vocal tract system. Performance of the proposed method is evaluated in terms of the deviation between true GCIs and hypothesized GCIs, using clean speech and degraded speech signals. The signal-to-noise ratio (SNR) of speech signals in the vicinity of GCIs has significant bearing on the performance of the proposed method. The proposed method is accurate and robust for detection of GCIs, even in the presence of degradations. Index Terms: glottal closure instants, excitation source, periodicity, coherent covariance sequence Voiced speech is produced by exciting a time-varying vocal tract system with a sequence of impulse-like excitations. The impulse-like excitation is due to the closure of glottis during the vibration of vocal folds. The time instant at which the closure is achieved (called glottal closure instant or GCI) is an important feature for analysis of speech signals. Detection of GCIs enables the identification of region of closed glottis within a pitch period. Analysis of short segments of speech signals over such regions helps in accurate estimation of vocal tract parameters such as formants [1], and also in the extraction of characteristics of voice source. In text-to-speech synthesis, accurate detection of GCIs is necessary for prosodic manipulation of speech sounds [2]. Moreover, speech signal in the vicinity of GCIs has relatively high signal-to-noise ratio (SNR), due to impulselike excitation and damped sinusoid-like impulse response of the vocal tract system. These regions of high SNR are likely to preserve features specific to sound and speaker, even under the influence of degradations. Hence, methods for robust detection of GCIs in speech signals are necessary. Some of the methods proposed for the detection of GCI assume a linear source-system model for the production of speech signal. These methods identify GCI with the time instant of strongest excitation which will be around the region with least predictability [3, 4, 5]. Normally, linear prediction (LP) residual is used as an estimate of source of excitation [5]. Another class of methods fordetection of GCIs is based on the properties of minimum phase signals and group delay functions. In [6], the average slope of the unwrapped phase spectrum of speech signal is computed as a function of time, and the positive zero crossings of the phase slope function are hypothesized as the instants of glottal closure. In [7], the phase spectrum is computed from the LP residual instead of speech signal to reduce the effects of truncation. Robustness of the group delay based methods against noise and distortion is studied in [8]. In [9], properties of the phase slope function are used to hypothesize candidates for GCI, which are validated using a dynamic programming approach. Energy-weighted group delay is proposed as a measure for the detection of GCIs in [10]. In this paper, we propose a method for detection of instants of glottal closure using the periodicity of significant excitations in speech signals. In Section 2, we describe the representation of excitation source in terms of the Hilbert envelope of the linear prediction residual. The section describes the proposed method for detection of GCI, and also the issues involved in the choice of parameters used in the method. Section 3 discusses the experiments conducted for evaluating the performance of the proposed method, and the results of these studies. Conclusions are given in Section 4.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.