Abstract

This paper presents comparative evaluations of 12 typical methods of estimating the fundamental frequency (F0) over huge speech-sound datasets in reverberant environments. They involve several classical algorithms such as cepstrum, AMDF, LPC, and autocorrelation methods. Other methods involve a few modern algorithms, i.e., instantaneous amplitude and/or frequency-based algorithms, such as TEMPO, IFHC, and PHIA. The comparative results revealed that the percentage of correct rates and SNRs of the estimated F0s were reduced drastically as reverberation time increased. This paper, thus, proposes a method of robustly and accurately estimating F0 in reverberant environments by utilizing the MTF concept and the source-filter model in complex cepstrum analysis. The MTF concept is used in this method to eliminate dominant reverberant characteristics from observed reverberant speech. The source-filter model is used to extract source information from the processed cepstrum. Finally, F0s are estimated from them by using the comb-filtering method. Additive-comparative evaluation was carried out on the proposed method and other typical methods. The results demonstrated that it was better than the previously reported methods in terms of robustness and in providing accurate F0 estimates in reverberant environments. [Work supported by a Grant-in-Aid for Science Research from the Japanese Ministry of Education No. 18680017.]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call