Speaker Identification Task Research Articles

The subject matter of the article are the neural network models designed or adapted for the problem of voice analysis in the context of the speaker identification and verification tasks. The goal of this work is to perform a comparative analysis of relevant neural network models in order to determine the model(s) that best meet the chosen formulated criteria, – model type, programming language of model’s implementation, parallelizing potential, binary or multiclass, accuracy and computing complexity. Some of these criteria were chosen because of universal importance, regardless of particular application, such as accuracy and computational complexity. Others were chosen due to the architecture and challenges of the scientific communication system mentioned in the work that performs tasks of the speaker identification and verification. The relevance of the paper lies in the prevalence of audio as a communication medium, which results in a wide range of practical applications of audio intelligence in various fields of human activity (business, law, military), as well as in the necessity of enabling and encouraging efficient environment for inward-facing audio-based scientific communication among young scientists in order for them to accelerate their research and to acquire scientific communication skills. To achieve the goal, the following tasks were solved: criteria for models to be judged upon were formulated based on the needs and challenges of the proposed model; the models, designed for the problems of speaker identification and verification, according to formulated criteria were reviewed with the results compiled into a comprehensive table; optimal models were determined in accordance with the formulated criteria. The following neural network based models have been reviewed: SincNet, VGGVox, Jasper, TitaNet, SpeakerNet, ECAPA_TDNN. Conclusions. For the future research and practical solution of the problem of speaker authentication it will be reasonable to use a convolutional neural network implemented in the Python programming language, as it offers a wide variety of development tools and libraries to utilize.

Read full abstract

Speaker recognition based on deep learning is currently the most advanced and mainstream technology in the industry. Adversarial attacks, an emerging and powerful attack against neural network models, were first applied in the image domain and gradually expanded to other domains, also posing serious security problems for speaker recognition. Common gradient-based attack methods such as FGSM, PGD, and MI-FGSM can deceive speaker recognition models with high confidence, yet their carefully crafted adversarial examples suffer from poor stealthiness and are easily perceived by the human ear. To improve the stealthiness of the adversarial examples, this paper proposes a new attack method called the Adaptive Decay Attack (ADA), which is applied to three different scenarios in speaker recognition. The method takes the set number of iterations as the termination condition, automatically adjusts the size of the maximum perturbation according to whether the attack is successful or not, and then uses the decay methods in learning rates such as exponential decay and cosine annealing to continuously reduce the step size. The experimental results show that under the two speaker recognition models x-vector, and i-vector, the proposed attack method improves the stealthiness metrics such as SNR and PESQ by at least 30% and 39%, respectively, compared with the best PGD attack under speaker identification of untargeted attacks. For the speaker identification task with targeted attacks, the average improvement is at least 20% and 25% compared to PGD. For the speaker verification task, the improvement is at least 29.5% and 33.4% compared to PGD. In addition, we also use this attack method for adversarial training to enhance the robustness of the model. Experimental results show that ADA-based adversarial training takes 28.31% less time than PGD-based adversarial training, and its improved robustness is generally superior to PGD-based adversarial training. Specifically, the attack success rate of PGD and ADA methods decreased from 50.88% to 36.47% and 64.74% to 45.82%, respectively.

Read full abstract

Speaker Identification Task Research Articles

Related Topics

Articles published on Speaker Identification Task

Analysis of Machine Learning Classifiers for Speaker Identification: A Study on SVM, Random Forest, KNN, and Decision Tree

Evaluation of a foreign speaker in forensic phonetics: a report

Sound map of urban areas recorded by smart devices: case study at Okayama and Kurashiki

A multi-task network for speaker and command recognition in industrial environments

Voice separation and recognition using machine learning and deep learning a review paper

IIRI-Net: An interpretable convolutional front-end inspired by IIR filters for speaker identification

Influence of phase information on voice pre-processing signal in the authentication system

COMPARATIVE ANALYSIS OF NEURAL NETWORK MODELS FOR THE PROBLEM OF SPEAKER RECOGNITION

Enhancement in Speaker Identification through Feature Fusion using Advanced Dilated Convolution Neural Network

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients

Power Normalized Gammachirp Cepstral (PNGC) coefficients-based approach for robust speaker recognition

Symmetric Saliency-Based Adversarial Attack to Speaker Identification

Generating Transferable Adversarial Examples for Speech Classification

Rationale and selection of voice signal pre-processing space in the authentication system

A Robust Approach for Speaker Identification Using Dialect Information

Hybrid machine learning classification scheme for speaker identification.

Attitudes of Educated Nigerians towards Varieties of English

A Highly Stealthy Adaptive Decay Attack Against Speaker Recognition

Speaker Naming in Arabic TV Programs

Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speaker Identification Task Research Articles

Related Topics

Articles published on Speaker Identification Task

Analysis of Machine Learning Classifiers for Speaker Identification: A Study on SVM, Random Forest, KNN, and Decision Tree

Evaluation of a foreign speaker in forensic phonetics: a report

Sound map of urban areas recorded by smart devices: case study at Okayama and Kurashiki

A multi-task network for speaker and command recognition in industrial environments

Voice separation and recognition using machine learning and deep learning a review paper

IIRI-Net: An interpretable convolutional front-end inspired by IIR filters for speaker identification

Influence of phase information on voice pre-processing signal in the authentication system

COMPARATIVE ANALYSIS OF NEURAL NETWORK MODELS FOR THE PROBLEM OF SPEAKER RECOGNITION

Enhancement in Speaker Identification through Feature Fusion using Advanced Dilated Convolution Neural Network

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients

Power Normalized Gammachirp Cepstral (PNGC) coefficients-based approach for robust speaker recognition

Symmetric Saliency-Based Adversarial Attack to Speaker Identification

Generating Transferable Adversarial Examples for Speech Classification

Rationale and selection of voice signal pre-processing space in the authentication system

A Robust Approach for Speaker Identification Using Dialect Information

Hybrid machine learning classification scheme for speaker identification.

Attitudes of Educated Nigerians towards Varieties of English

A Highly Stealthy Adaptive Decay Attack Against Speaker Recognition

Speaker Naming in Arabic TV Programs

Joint speaker separation and recognition using non-negative matrix deconvolution with adaptive dictionary