A FRAMEWORK FOR MULTILINGUAL TEXT- INDEPENDENT SPEAKER IDENTIFICATION SYSTEM

Sundaradhas Selva Nidhyananthan,R Shantha Selva Kumari

doi:10.3844/jcssp.2014.178.189

Sundaradhas Selva Nidhyananthan, R Shantha Selva Kumari

Open Access

https://doi.org/10.3844/jcssp.2014.178.189

Copy DOI

Journal: Journal of Computer Science	Publication Date: Jan 1, 2014
Citations: 5	License type: cc-by

Abstract

This article evaluates the performance of Extreme Learning Machine (ELM) and Gaussian Mixture Model (GMM) in the context of text independent Multi lingual speaker identification for recorded and synthesized speeches. The type and number of filters in the filter bank, number of samples in each frame of the speech signal and fusion of model scores play a vital role in speaker identification accuracy and are analyzed in this article. Extreme Learning Machine uses a single hidden layer feed forward neural network for multilingual speaker identification. The individual Gaussian components of GMM best represent speaker-dependent spectral shapes that are effective in speaker identity. Both the modeling techniques make use of Linear Predictive Residual Cepstral Coefficient (LPRCC), Mel Frequency Cepstral Coefficient (MFCC), Modified Mel Frequency Cepstral Coefficient (MMFCC) and Bark Frequency Cepstral Coefficient (BFCC) features to represent the speaker specific attributes of speech signals. Experimental results show that GMM outperforms ELM with speaker identification accuracy of 97.5% with frame size of 256 and frame shift of half of frame size and filter bank size of 40.

Highlights

In automatic speaker recognition, an algorithm plays speaker’s voice is recorded and typically a number of the listener’s role in decoding the speech into a features are extracted to form a voice print model
Extreme Learning Machine (ELM) and Gaussian Mixture Model (GMM) based speaker Identification is performed under different frame size and filter bank size conditions and the identification performance is analyzed
The overall identification rate of 79.25% is achieved for Modified Mel Frequency Cepstral Coefficient (MMFCC) feature with Frame size 256 by using ELM modeling technique

Summary

Introduction

An algorithm plays speaker’s voice is recorded and typically a number of the listener’s role in decoding the speech into a features are extracted to form a voice print model. This is hypothesis concerning the speaker’s identity. Extreme Learning Machine (ELM) modeling technique is used to provide better performance than the traditional tuning-based learning methods (Bharathi and Natarajan, 2011). It provides the best generalization performance at extremely fast learning speed. The Input weights and hidden neurons or kernel parameters are not necessarily tuned

Objectives

Methods

Results

Discussion

Conclusion