Speaker Identification Performance Research Articles

Speaker identification aims at determining the speaker identity by analyzing his voice characteristics, and relies typically on statistical models or machine learning techniques. Frequency-domain features are by far the most used choice to encode the audio input in sound recognition. Recently, some studies have also analyzed the use of time-domain raw waveform (RW) with deep neural network (DNN) architectures. In this paper, we hypothesize that both time-domain and frequency-domain features can be used to increase the robustness of speaker identification task in adverse noisy and reverberation conditions, and we present a method based on a late fusion DNN using RWs and gammatone cepstral coefficients (GTCCs). We analyze the characteristics of RW and spectrum-based short-time features, reporting advantages and limitations, and we show that the joint use can increase the identification accuracy. The proposed late fusion DNN model consists of two independent DNN branches made primarily by convolutional neural networks (CNN) and fully connected neural networks (NN) layers. The two DNN branches have as input short-time RW audio fragments and GTCCs, respectively. The late fusion is computed on the predicted scores of the DNN branches. Since the method is based on short segments, it has the advantage of being independent from the size of the input audio signal, and the identification task can be computed by summing the predicted scores over several short-time frames. Analysis of speaker identification performance computed with simulations show that the late fusion DNN model improves the accuracy rate in adverse noise and reverberation conditions in comparison to the RW, the GTCC, and the mel-frequency cepstral coefficients (MFCCs) features. Experiments with real-world speech datasets confirm the efficiency of the proposed method, especially with small-size audio samples.

Read full abstract

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.

Read full abstract

Speaker Identification Performance Research Articles

Related Topics

Articles published on Speaker Identification Performance

SIG: Speaker Identification in Literature via Prompt-Based Generation

Improving the Performance of Low-resourced Speaker Identification with Data Preprocessing

Evaluating the effects of task design on unfamiliar Francophone listener and automatic speaker identification performance

Speaker identification in courtroom contexts – Part II: Investigation of bias in individual listeners’ responses

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

A Novel RBFNN-CNN Model for Speaker Identification in Stressful Talking Environments

Emotional speaker identification using a novel capsule nets model

Noise-robust text-dependent speaker identification using cochlear models.

Robust Speaker Identification Based on Binaural Masks

Speaker forensic identification using joint factor analysis and i-vector

CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions

Binaural speaker identification using the equalization-cancelation technique

Influence of verbalization for voice on speaker identification performance

Quantifying Cochlear Implant Users' Ability for Speaker Identification using CI Auditory Stimuli.

Thorough evaluation of TIMIT database speaker identification performance under noise with and without the G.712 type handset

Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments

New Feature Vectors using GFCC for Speaker Identification

Noise-robust speech triage.

Emirati-accented speaker identification in each of neutral and shouted talking environments

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speaker Identification Performance Research Articles

Related Topics

Articles published on Speaker Identification Performance

SIG: Speaker Identification in Literature via Prompt-Based Generation

Improving the Performance of Low-resourced Speaker Identification with Data Preprocessing

Evaluating the effects of task design on unfamiliar Francophone listener and automatic speaker identification performance

Speaker identification in courtroom contexts – Part II: Investigation of bias in individual listeners’ responses

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

A Novel RBFNN-CNN Model for Speaker Identification in Stressful Talking Environments

Emotional speaker identification using a novel capsule nets model

Noise-robust text-dependent speaker identification using cochlear models.

Robust Speaker Identification Based on Binaural Masks

Speaker forensic identification using joint factor analysis and i-vector

CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions

Binaural speaker identification using the equalization-cancelation technique

Influence of verbalization for voice on speaker identification performance

Quantifying Cochlear Implant Users' Ability for Speaker Identification using CI Auditory Stimuli.

Thorough evaluation of TIMIT database speaker identification performance under noise with and without the G.712 type handset

Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments

New Feature Vectors using GFCC for Speaker Identification

Noise-robust speech triage.

Emirati-accented speaker identification in each of neutral and shouted talking environments