This study examines how ten different audio features, including MFCC, mel-spectrogram, chroma, and spectral contrast etc., influence speaker change detection (SCD) performance. The analysis is conducted using two unsupervised methods: Bayesian information criterion with Gaussian mixture model (BIC-GMM), a model-based approach, and Kullback-Leibler divergence with Gaussian Mixture Model (KL-GMM), a metric-based approach. Evaluation involved statistical analysis of feature changes in relation to speaker changes (vice versa), supported by comprehensive experimental validation. Experimental results show MFCC as the most effective feature, demonstrating consistently good performance across both methods. Features such as zero crossing rate, chroma, and spectral contrast also showed notable effectiveness within the BIC-GMM framework, while mel-spectrogram consistently ranked as the least influential feature in both approaches. Further analysis revealed that BIC-GMM exhibits greater stability in managing variations in feature performance, whereas KL-GMM is more sensitive to threshold optimization. Nevertheless, KL-GMM achieved competitive results when paired with specific features, such as MFCC and zero crossing rate. These findings offer valuable insights into the impact of feature selection on unsupervised SCD, providing guidance for the development of more robust and accurate algorithms for practical applications.
Read full abstract