Abstract

The pitch period as an essential feature is used in various speech-related works. Most actual projects collect speech signals in complex noise environments. Thus, the noise resistance of the algorithm for accurate pitch estimation has become more critical than ever. However, many state-of-the-art algorithms fail to obtain good results when dealing with noisy speech files at a low signal-to-noise ratio (SNR) value. This study presents a new noise-resistant pitch estimation algorithm based on the Radon transform and reduces the influence of formants with the modification of the classical equation. In addition, we use the difference between the pitch candidates of the consecutive frames as part of the criterion for the decoding of the Viterbi algorithm to strengthen the correlation of the pitch estimates and make the pitch contours smoother. We synthesized three noisy speech databases with 18 types of collected environmental noise and compared our algorithm with 7 state-of-the-art algorithms. The proposed algorithm has the best performance on CSTR and self-recorded databases and reduces Gross Pitch Error (GPE) rate by over 12% at 0 dB SNR against Bayesian Pitch Tracker. In particular, the GPE rate of our proposed algorithm can be maintained under 25% at 0 dB SNR, while BaNa only achieves 35%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.