Abstract

This paper presents a new method that estimates the fundamental frequency in the case of a real noisy environment when many persons speak at the same time and considers the case of two speakers. It essentially gives an accurate estimation of the pitch characterizing the second speaker. The first pitch is determining by detecting the Autocorrelation of the Multi-scale Product (AMP) of the mixture signal. Then a multiple-comb filters is applied to eliminate the dominant signal. After subtracting the resulting signal from the mixture, we obtain the residual signal. Next, we reapply the AMP to the obtained signal to estimate the second pitch. We get a matrix of the second pitch candidates. We classify its elements into three groups. After, we calculate the mean of each column of the appropriate selected group. Finally, the intrusion pitches of each frame are obtained. Experiments are performed using Cooke database. The results show the robustness and effectiveness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call