Abstract

The extreme learning machine (ELM) is one of the machine learning applications used for regression and classification systems. In this paper, an extended comparison between an ELM and the backpropagation neural network (BPNN)-based i-vector is given in terms of a closed-set speaker identification task using 120 speakers from the TIMIT database. The system is composed of the mel frequency cepstal coefficient (MFCC) and power normalized cepstal coefficient (PNCC) approaches to form the feature extraction stage, while the cepstral mean variance normalization (CMVN) and feature warping are applied in order to mitigate the linear channel effect. The system is utilized with equal numbers of speakers of both genders with 120 speakers with eight dialects from the TIMIT database. The results demonstrate that the combination of the i-vector with the ELM for different features has the highest speaker identification accuracy (SIA) compared with the combination of the BPNN with the i-vector. The results also show that the i-vector with ELM approach is faster than the BPNN-based i-vector and it has the highest SIA.

Highlights

  • There are several open issues in machine learning techniques, such as intensive human interference, slow learning speed, and poor learning [1]

  • This paper provides a combination of the i-vector approach with the extreme learning machine (ELM) for speaker identification and the main motivations for this combination are as follows: This combination gives higher speaker identification accuracy (SIA) than each technique alone

  • In this paper, a fair comparison between the ELM and backpropagation neural network (BPNN) based the i-vector was given in terms of a closedset speaker identification task using the TIMIT database

Read more

Summary

Introduction

There are several open issues in machine learning techniques, such as intensive human interference, slow learning speed, and poor learning [1]. The ELM was successfully employed for speaker identification in our previous studies in [10,11,12], the depicted models are time-consuming and comparisons with other neural networks methods were not considered. We compare two classifiers based on ELM and BPNN in terms of their speed and SIA for evaluating closed-set speaker identification performance. 2. The main block diagram Figure 1 shows the main block diagram using MFCC and PNCC for feature extraction, CMVN and feature warping for feature normalization, the i-vector as an acoustic model, and the BPNN and ELM as the classifiers. The PNCC implementation starts by preemphasizing the speech signal, and a short-time Fourier transform (STFT) is performed using a Hamming window of 16 ms duration with 8 ms frame period. The mean power is estimated for each frame and the running average for time-frequency normalization is used

64 Speakers From TIMIT Database
The concept of BPNN
The concept of ELM
Experimental results and discussion
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.