Abstract

This paper describes the implementation of unsupervised speaker segmentation and clustering system. Main objective of the work presented in this paper is to study the performance of speaker diarization system using a new feature-set called Temporal Energy of Subband Cepstral Coefficients (TESBCC) and Pitch based features. The system first classifies the audio signal into speech and nonspeech signal using average zero crossing rate (ZCR), followed by a gender clssifier stage. Speaker change is first roughly detected using Hotelling T2 distance metric and then the Bayesian information criterion (BIC) is used to validate the potential speaker change point to reduce the false alarm rate. The bottom-up approach is used for speaker clustering. The performance of the speaker segmentation and clustering system using TESBCC is compared with that using MFCC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call