Abstract

This paper proposes a method for extracting the fundamental frequency of voiced speech from distant speech signals. The method is based on the impulse-like nature of excitation in voiced speech. The characteristics of impulse-like excitation are extracted by filtering the speech signal through a cascade of resonators located at zero frequency. The resulting filtered signal preserves information specific to the fundamental frequency, in the sequence of positive-to-negative zero crossings. Also, the filtered signal is free from the effects of resonances of the vocal tract. An estimate of the fundamental frequency is derived from the short-time spectrum of the filtered signal. This estimate is used to remove spurious zero crossings in the filtered signal. The proposed method depends only on the strengths of impulse-like excitations in the direct component of distant speech signals, and not on the similarity of speech signal in successive glottal cycles. Hence, the method is robust to the effects of reverberation and noise. Performance of the method is evaluated using a database of close-speaking and distant speech signals. Experiments show that the accuracy of the proposed method is significantly higher than that of existing methods based on time-domain and frequency-domain processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.