Abstract

In this paper we discuss the role of fundamental frequency f0 and Formants F1 F2 and F3 of the speech signal in unsupervised source separation of real recorded convolutive speech mixtures. In unsupervised source separation there is no prior knowledge of the underlying sources and mixing conditions. We observed that supervised source separation using both f0 and Formants gives most accurate separation results when compared to the results obtained with f0 or Formants alone. Hence it is used as ideal case or reference to compare the separation results obtained for unsupervised source separation. The unsupervised source separation is discussed using (1) cross correlation of formants of different frames along with f0 and (2) standard deviation of magnitude of frequency components in F1 F2 and F3 regions of the spectrogram. It is observed that separation results obtained using proposed unsupervised methods are very close to the ideal case. The results show that this method performs better than the classical BSS (Blind Source Separation) algorithms like ICA (Independent Component Analysis) and NMF (Non Negative Matrix Factorization) which works well only for instantaneous mixtures where delay is neglected and supervised case respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call