Abstract

Pitch, or fundamental frequency, estimation is an important problem in speech processing. Research on pitch extraction is several years old and numerous algorithms have been developed over the years to improve its accuracy. It becomes more difficult in the presence of additive noise and reverberation because noise corrupts the periodicity information which is vital for estimating the pitch. In this paper, we present a quantitative analysis on pitch tracking in the presence of reverberation by different state of the art methods. We compare Neural Network (NN) based approaches such as the Subband Autocorrelation Classifier (SAcC) with signal processing based methods such as YIN and RAPT. We enhance the performance of SAcC by introducing a cross-correlogram feature (CC+SAcC). We further show that multi-style training of NN using the CC+SAcC feature outperforms all the other methods. Experiments were conducted using artificially reverberated Keele and TIMIT databases with room impulse responses of varying T60 values.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.