Abstract

This paper investigate the effect of distance on the Gaussian Mixture Models (GMM) for text dependent speaker identification. Three stages are used for three different distances from the microphone (1m, 2m, and 3m). The set of feature extraction used here include Mel frequency cepstral coefficient (MFCC), Bark frequency cepstral coefficient (BFCC) and linear predictive cepstral coefficient (LPCC). These features are obtained from 20 speakers (10 adults and 10 children) ;all spoke five Arabic words in 5 seconds. The set of classification includes two types GMM and multilayer perceptron neural network (MLP). Total results show that MFCC has the best performance in feature extraction, and GMM has better recognition than MLP as total recognition in GMM is 93.15% and recognition in MLP is 88.06%.The results show also that the recognition rate decreases from 93.15% to 80.82% as the distance is increased from 1m to 3m.

Highlights

  • The Speech signal conveys several levels of information Primarily the speech signal conveys the words or message being spoken, but on a secondary level, the signal conveys information about the identity of the talker

  • Speaker recognition uses the acoustic features of the speech signal to discriminate between individuals

  • There are many algorithms and models that can be used for speaker recognition including Neural Networks, unimodal Gaussians, Vector Quantization, Radial Basis Functions, Hidden Markov Models and Gaussian Mixture Models(GMMs)

Read more

Summary

1- Introduction

The Speech signal conveys several levels of information Primarily the speech signal conveys the words or message being spoken, but on a secondary level, the signal conveys information about the identity of the talker. Speaker recognition uses the acoustic features of the speech signal to discriminate between individuals These acoustic features can vary greatly from one speaker to another depending upon their anatomy and behavioral characteristics. There are many algorithms and models that can be used for speaker recognition including Neural Networks, unimodal Gaussians, Vector Quantization, Radial Basis Functions, Hidden Markov Models and Gaussian Mixture Models(GMMs) These perform well under clean speech conditions, but in many cases performance degrades when test utterances are corrupted by noise, mismatched conditions or if there are small amounts of training and testing data. The resulting features are compared at these per mentioned distances and show that MFCC has the best performance ,GMM has better recognition than MLP Besides this introduction, this paper contains another three sections.

Block Diagram Of Speaker Recognition
Feature Extraction
Bark Frequency Cepstrum Coefficient
Classification
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call