Abstract

Speaker-independent multi-pitch tracking has been a long-standing problem in speech processing. In this study, we extend a recurrent neural network - factorial hidden Markov model (RNN-FHMM) framework, and use the utterance-level permutation invariant training (uPIT) criterion for multi-pitch tracking. Separated speech and label permutations from a speech separation uPIT-RNN have been further incorporated to improve pitch tracking performance. We evaluate our methods on the GRID database. Results indicate that the proposed speech separation - pitch tracking system with matched uPIT label permutations outperforms all other gender-dependent and speaker-independent multi-pitch trackers. The improvement is more significant for challenging same-gender mixtures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call