In this paper, a novel multi-speaker segmentation technique is presented. This technique makes use of wavelets and support vector machines (SVMs) to segment specific speakers’ speech signals from multi-dialog environments. The proposed method first applies wavelets to determine the acoustical features such as subband power and pitch information from a given multi-dialog speech data. Then the multi-speaker segmentation of the given multi-dialog speech data can be accomplished by the use of a bottom–up SVM over these acoustical features and additional parameters, such as frequency cepstral coefficients. A public audio database, Aurora-2, is used to evaluate the performances of the proposed method. Experimental results show that the accuracy of multi-speaker segmentation is 100% achieved in the combination of two speakers. And the segmental accuracy can achieve at least 94.12% and 85.93% for 4-speaker and 8-speaker conditions, respectively.
Read full abstract