ABSTRACT Compared with humans, who have more powerful auditory ability in discriminating and identifying speakers in noisy environments, traditional forensic automatic speaker recognizers do not perform well when dealing with noisy recordings. This paper proposes a GMM-UBM Forensic Automatic Speaker Recognition (FASR) System to reduce the effect of noise on performance. The system uses Gammatone Frequency Cepstral Coefficients (GFCC) based on an auditory periphery model and also incorporates a Principal Component Analysis (PCA) algorithm. The system was tested and validated using Mandarin voice databases compromised with different levels of white noise and office noise. The performance of the system was compared with a baseline system using Mel Frequency Cepstral Coefficients (MFCC) and also PCA under the same conditions. The results show that the performance of the combined GFCC system achieved a substantial improvement when compared with the baseline MFCC system under conditions of a high level of office noise.
Read full abstract