Abstract

This paper shows the interest of supplementary information in speaker diarization and detection system. This information consists in using of a priori speaker information, which is the number of speakers involved in audio streams and training data available for one speaker, or for all the speakers involved in conversation. Two different speaker diarization systems are built, using two clustering approaches; Hierarchical Ascending Classification (HAC) and Support Vector Machines (SVM) models. The impact of this a priori information is evaluated in terms of speaker diarization error rate (DER) and speaker detection rate (SDR). The experiments were achieved on NIST2005, show that the diarization and detection performances are butter, when using both of information (number of speakers and training data available for one speaker), than when knowing only the number of speakers. In accordance with this, our results show that the speaker segmentation with SVM generates approximately 12.01% of absolute diarization error less than the HAC method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call