A Deep Neural Network Model for Speaker Identification

Feng Ye,Jun Yang

doi:10.3390/app11083603

Abstract

Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent neural network (RNN). Indeed, these methods perform well in many tasks, but there is no attempt to combine these two network models to study the speaker identification task. Due to the spectrogram that a speech signal contains, the spatial features of voiceprint (which corresponds to the voice spectrum) and CNN are effective for spatial feature extraction (which corresponds to modeling spectral correlations in acoustic features). At the same time, the speech signal is in a time series, and deep RNN can better represent long utterances than shallow networks. Considering the advantage of gated recurrent unit (GRU) (compared with traditional RNN) in the segmentation of sequence data, we decide to use stacked GRU layers in our model for frame-level feature extraction. In this paper, we propose a deep neural network (DNN) model based on a two-dimensional convolutional neural network (2-D CNN) and gated recurrent unit (GRU) for speaker identification. In the network model design, the convolutional layer is used for voiceprint feature extraction and reduces dimensionality in both the time and frequency domains, allowing for faster GRU layer computation. In addition, the stacked GRU recurrent network layers can learn a speaker’s acoustic features. During this research, we tried to use various neural network structures, including 2-D CNN, deep RNN, and deep LSTM. The above network models were evaluated on the Aishell-1 speech dataset. The experimental results showed that our proposed DNN model, which we call deep GRU, achieved a high recognition accuracy of 98.96%. At the same time, the results also demonstrate the effectiveness of the proposed deep GRU network model versus other models for speaker identification. Through further optimization, this method could be applied to other research similar to the study of speaker identification.

Highlights

Speaker recognition [1] is an important bio-feature recognition method
In order to speed up the training procedure, our proposed network schemes are implemented using TensorFlow deep learning library written in Python, which can be executed on a graphics processing unit (GPU)
To visualize the learning efficiency of the models, one training progress of the proposed deep gated recurrent unit (GRU) network model with the Aishell-1 dataset is shown in Figure 8, where the left block presents loss vs. training epochs and the right part represents the accuracy vs. training epochs

Summary

Introduction

Speaker recognition [1] is an important bio-feature recognition method. It is the task of recognizing the identity of someone based on the speaker’s speech signal. Due to the unique characteristics of speech signal, speaker recognition has drawn increasing attention from researchers in broad fields of information security for many years. Speaker recognition study can be considered as the use of employing statistical methods to identify the individuals based on their unique acoustic properties, which are encoded in a sequence of successive samples in time. Speaker recognition can be divided into two modes: Speaker verification and speaker identification [2,3,4]. We mainly conduct research on speaker identification

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Apr 16, 2021
Citations: 58	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Deep Neural Network Model for Speaker Identification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

First-Break Picking Classification Models Using Recurrent Neural Network
Mohammed Ayub ... Sanlinn Kaka
-
Mohammed Ayub, et. al.Mohammed Ayub ... Sanlinn Kaka
15 Dec 2021
15 Dec 2021

Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network
Hassan El Bahi ... Abdelkarim Zatni
Multimedia Tools and Applications | VOL. 78
Hassan El Bahi, et. al.Hassan El Bahi ... Abdelkarim Zatni
11 Jun 2019
Multimedia Tools and Applications | VOL. 78

A Deep Bidirectional GRU Network Model for Biometric Electrocardiogram Classification Based on Recurrent Neural Networks
Htet Myet Lynn ... Sung Bum Pan
IEEE Access | VOL. 7
Htet Myet Lynn, et. al.Htet Myet Lynn ... Sung Bum Pan
01 Jan 2019
IEEE Access | VOL. 7

GRU-MF: A Novel Appliance Classification Method for Non-Intrusive Load Monitoring Data
Aji Gautama Putrada ... Mohamad Nurkamal Fauzan
-
Aji Gautama Putrada, et. al.Aji Gautama Putrada ... Mohamad Nurkamal Fauzan
03 Nov 2022
03 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Deep Neural Network Model for Speaker Identification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences