Gaussian Mixture Model-Universal Background Model Research Articles

Some practical uses of ASR have been implemented, including the transcription of meetings and the usage of smart speakers. It is the process by which speech waves are transformed into text that allows computers to interpret and act upon human speech. Scalable strategies for developing ASR systems in languages where no voice transcriptions or pronunciation dictionaries exist are the primary focus of this work. We first show that the necessity for voice transcription into the target language can be greatly reduced through cross-lingual acoustic model transfer when phonemic pronunciation lexicons exist in the new language. Afterwards, we investigate three approaches to dealing with languages that lack a pronunciation lexicon. Secondly, we have a look at the efficiency of graphemic acoustic model transfer, which makes it easy to build pronunciation dictionaries. Thesis problems can be solved, in part, by investigating optimization strategies for training on huge corpora (such as GA+HMM and DE+HMM). In the training phase of acoustic modelling, the suggested method is applied to traditional methods. Read speech and HMI voice experiments indicated that while each data augmentation strategy alone did not always increase recognition performance, using all three techniques together did. Power normalised cepstral coefficient (PNCC) features are tweaked somewhat in this work to enhance verification accuracy. To increase speaker verification accuracy, we suggest employing multiple “Gaussian Mixture Model-Universal Background Model (GMM-UBM) and SVM classifiers”. Importantly, pitch shift data augmentation and multi-task training reduced bias by more than 18% absolute compared to the baseline system for read speech, and applying all three data augmentation techniques during fine tuning reduced bias by more than 7% for HMI speech, while increasing recognition accuracy of both native and non-native Dutch speech.

Read full abstract

Automatic Speaker Identification (ASI) is one of the active fields of research in signal processing. Various machine learning algorithms have been used for this purpose. With the recent developments in hardware technologies and data accumulation, Deep Learning (DL) methods have become the new state-of-the-art approach in several classification and identification tasks. In this paper, we evaluate the performance of traditional methods such as Gaussian Mixture Model-Universal Background Model (GMM-UBM) and DL-based techniques such as Factorized Time-Delay Neural Network (FTDNN) and Convolutional Neural Networks (CNN) for text-independent closed-set automatic speaker identification on two datasets with different conditions. LibriSpeech is one of the experimental datasets, which consists of clean audio signals from audiobooks, collected from a large number of speakers. The other dataset was collected and prepared by us, which has rather limited speech data with low signal-to-noise-ratio from real-life conversations of customers with the agents in a call center. The duration of the speech signals in the query phase is an important factor affecting the performances of ASI methods. In this work, a CNN architecture is proposed for automatic speaker identification from short speech segments. The architecture design aims at capturing the temporal nature of speech signal in an optimum convolutional neural network with low number of parameters compared to the well-known CNN architectures. We show that the proposed CNN-based algorithm performs better on the large and clean dataset, whereas on the other dataset with limited amount of data, traditional method outperforms all DL approaches. The achieved top-1 accuracy by the proposed model is 99.5% on 1-second voice instances from LibriSpeech dataset.

Read full abstract

Gaussian Mixture Model-Universal Background Model Research Articles

Related Topics

Articles published on Gaussian Mixture Model-Universal Background Model

Development of efficient techniques for ASR System for Speech Detection and Recognization system using Gaussian Mixture Model- Universal Background Model

Influence of tree-based multi-layer node information on scoring accuracy and speed of speaker verification

On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification

Power Normalized Gammachirp Cepstral (PNGC) coefficients-based approach for robust speaker recognition

Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users.

Joint short-time speaker recognition and tracking using sparsity-based source detection

Depression assessment in people with Parkinson’s disease: The combination of acoustic features and natural language processing

Application of Artificial Intelligence on Post Pandemic Situation and Lesson Learn for Future Prospects

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data

Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification

A GMM supervector approach for spoken Indian language identification for mismatch utterance length

Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification With Autoregressive Prediction Coding

Based on machine learning scheme to develop a smart robot embedded with GMM-UBM

Two-Level Classification in Determining the Age and Gender Group of a Speaker

Middle Eastern and North African English Speech Corpus (MENAESC): Automatic Identification of MENA English Accents

Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Overlapping region reconstruction in nuclei image segmentation

Robust speaker recognition based on biologically inspired features

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Gaussian Mixture Model-Universal Background Model Research Articles

Related Topics

Articles published on Gaussian Mixture Model-Universal Background Model

Development of efficient techniques for ASR System for Speech Detection and Recognization system using Gaussian Mixture Model- Universal Background Model

Influence of tree-based multi-layer node information on scoring accuracy and speed of speaker verification

On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification

Power Normalized Gammachirp Cepstral (PNGC) coefficients-based approach for robust speaker recognition

Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users.

Joint short-time speaker recognition and tracking using sparsity-based source detection

Depression assessment in people with Parkinson’s disease: The combination of acoustic features and natural language processing

Application of Artificial Intelligence on Post Pandemic Situation and Lesson Learn for Future Prospects

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data

Self-segmentation of pass-phrase utterances for deep feature learning in text-dependent speaker verification

A GMM supervector approach for spoken Indian language identification for mismatch utterance length

Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification With Autoregressive Prediction Coding

Based on machine learning scheme to develop a smart robot embedded with GMM-UBM

Two-Level Classification in Determining the Age and Gender Group of a Speaker

Middle Eastern and North African English Speech Corpus (MENAESC): Automatic Identification of MENA English Accents

Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Overlapping region reconstruction in nuclei image segmentation

Robust speaker recognition based on biologically inspired features