Speaker Recognition System Research Articles

Though there are advancements in speaker recognition technology, available systems often fail to correctly recognize speakers especially in noisy environments. The use of Mel-frequency cepstral coefficients (MFCC) has been improved using Convolutional Neural Networks (CNN) yet difficulties in achieving high accuracies still exists. Hybrid algorithms combining MFCC and Region-based Convolutional Neural Networks (RCNN) have been found to be promising. In this research features from speech signals were extracted for speaker recognition, to denoise the signals, design and develop a DFT-based denoising system using spectrum subtraction and to develop a speaker recognition method for the Verbatim Transcription using MFCC. The DFT was used to transform the sampled audio signal waveform into a frequency-domain signal. RCNN was used to model the characteristics of speakers based on their voice samples, and to classify them into different categories or identities. The novelty of the research was that it used MFCC integrated with RCNN and optimized with Host-Cuckoo Optimization (HCO) algorithm. HCO algorithm is capable of further weight optimization through the process of generating fit cuckoos for best weights. It also captured the temporal dependencies and long-term information. The system was tested and validated on audio recordings from different personalities from the National Assembly of Kenya. The results were compared with the actual identity of the speakers to confirm accuracy. The performance of the proposed approach was compared with two other existing speaker recognition the traditional approaches being MFCC-CNN and Linear Predictive Coefficients (LPC)-CNN. The comparison was based the Equal Error Rate (EER), False Rejection Rate (FRR), False Match Rate (FMR), and True Match Rate (TMR). Results show that the proposed algorithm outperformed the others in maintaining a lowest EER, FMR, FRR and highest TMR.

Read full abstract

In response to the optimal extraction of DCT coefficients in facial images, the author proposes a DCT coefficient extraction method based on discriminant analysis. Based on the discriminant analysis of DCT coefficients, the DCT coefficients with high discriminant values are selected as features. Comparing the DPA based discrete cosine coefficient selection method proposed by the author with the traditional Zigzag discrete cosine coefficient selection method, experiments were conducted on the ORL face database and the Yale face database, respectively. The recognition performance on the ORL face database was higher than that on the Yale face database, as the facial image expression and lighting changes in the ORL database were relatively few, making it suitable for extracting key features. In response to the problem that the speech parameter MFCC is greatly affected by noise and can only reflect the static characteristics of speech, the author extracted gamma pass filtering cepstrum coefficients with human auditory characteristics and gamma pass sliding differential cepstrum coefficients that can reflect the dynamic characteristics of speech based on gamma tone filters and sliding differential cepstrum. In the NUST603 speech database, under pure background, the recognition rate based on GFSDCC features reached 89.88%, and the recognition effect based on GFCC features was 87.52%, which is 4.66% and 2.36% higher than that based on MFCC features. In noisy environments, the average recognition rates of speaker recognition systems based on GFCC and GFSDCC are 56.06% and 59.07%, while the average recognition rates of speaker recognition systems based on MFCC speech features are 53.89%, 2.17% and 5.18% higher, respectively. The gain in this recognition effect comes from the characteristics of the auditory model, as the Gammatone filter effectively reflects the noise resistance of the human auditory system.

Read full abstract

Speaker Recognition System Research Articles

Related Topics

Articles published on Speaker Recognition System

Speaker Recognition System Using Hybrid of MFCC and RCNN with HCO Algorithm Optimization

Balancing validity and reliability as a function of sampling variability in forensic voice comparison

Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?

Speaker Recognition: A Comparative analysis Between Deep Learning and Non-Deep Learning Methodologies

Smart reception: An artificial intelligence driven bangla language based receptionist system employing speech, speaker, and face recognition for automating reception services

A Small Brazilian Portuguese Speech Corpus for Speaker Recognition Study

Version control of speaker recognition systems

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones

Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagation

Transferable universal adversarial perturbations against speaker recognition systems

Enrollment-Stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound

Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining

Speaker Recognition Using MFCC-BPNN-HHO

A multi-level power grid enhanced identity authentication data management platform based on filtering algorithms

Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset

INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION

Voice-based Biometric Data through Machine Learning Algorithm using Feature-based Classification with the Wood Texture using Android Studio

Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

Using Voice Technologies to Support Disabled People

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speaker Recognition System Research Articles

Related Topics

Articles published on Speaker Recognition System

Speaker Recognition System Using Hybrid of MFCC and RCNN with HCO Algorithm Optimization

Balancing validity and reliability as a function of sampling variability in forensic voice comparison

Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?

Speaker Recognition: A Comparative analysis Between Deep Learning and Non-Deep Learning Methodologies

Smart reception: An artificial intelligence driven bangla language based receptionist system employing speech, speaker, and face recognition for automating reception services

A Small Brazilian Portuguese Speech Corpus for Speaker Recognition Study

Version control of speaker recognition systems

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones

Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagation

Transferable universal adversarial perturbations against speaker recognition systems

Enrollment-Stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound

Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining

Speaker Recognition Using MFCC-BPNN-HHO

A multi-level power grid enhanced identity authentication data management platform based on filtering algorithms

Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset

INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION

Voice-based Biometric Data through Machine Learning Algorithm using Feature-based Classification with the Wood Texture using Android Studio

Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

Using Voice Technologies to Support Disabled People