A Particle Swarm Optimization-Based Approach to Speaker Segmentation Based on Independent Component Analysis on GSM Digital Speech

S M Mirrezaie,Amir Asnaashari,Karim Faez,Ali Ziaei

doi:10.1109/isspit.2008.4775731

Abstract

Adaptive Multi-Rate (AMR) codec was standardized for GSM in 1999. AMR offers substantial improvement over previous GSM speech codecs in error robustness by adapting speech and channel coding depending on channel conditions. The Adaptive Multi-Rate speech codec is adopted as a standard for IMT-2000 by ETSI and 3GPP and consists of eight source codecs with bit rates from 4.75 to 12.2 kbit/s. In this paper, we present an approach comprising of particle swarm optimization (PSO), which encodes possible segmentations of an audio record, and measures mutual information between these segments and the audio data. This measure is used as the fitness function for the PSO. A compact encoding of the solution for PSO which decreases the length of the PSO individuals and enhances the PSO convergence properties is adopted. The algorithm has been tested on two actual sets of data with AMR format for speaker segmentation, obtaining very good results in all test problems. The results have been compared to the widely used a genetic algorithm-based in several practical situations. No assumptions have been made about prior knowledge of speech signal characteristics. However, we assume that the speakers do not speak simultaneously and that we have no real-time constraints.

Full Text