A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.

Mohammad Al-Qaderi,Ahmad Rad,Elfituri Lahamer

doi:10.3390/s21155097

Mohammad Al-Qaderi, Ahmad Rad + Show 1 more

Open Access

https://doi.org/10.3390/s21155097

Copy DOI

Journal: Sensors	Publication Date: Jul 28, 2021
Citations: 12	License type: CC BY 4.0

Affiliation: Hashemite University, Simon Fraser University

Abstract

We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a discriminative model—support vector machine (SVM) classifiers as well as prosodic features and short-term spectral features to concurrently classify a speaker’s gender and his/her identity. The proposed architecture works in a semi-sequential manner consisting of two stages: the first classifier exploits the prosodic features to determine the speaker’s gender which in turn is used with the short-term spectral features as inputs to the second classifier system in order to identify the speaker. The second classifier system employs two types of short-term spectral features; namely mel-frequency cepstral coefficients (MFCC) and gammatone frequency cepstral coefficients (GFCC) as well as gender information as inputs to two different classifiers (GMM and GMM supervector-based SVM) which in total leads to construction of four classifiers. The outputs from the second stage classifiers; namely GMM-MFCC maximum likelihood classifier (MLC), GMM-GFCC MLC, GMM-MFCC supervector SVM, and GMM-GFCC supervector SVM are fused at score level by the weighted Borda count approach. The weight factors are computed on the fly via Mamdani fuzzy inference system that its inputs are the signal to noise ratio and the length of utterance. Experimental evaluations suggest that the proposed architecture and the fusion framework are promising and can improve the recognition performance of the system in challenging environments where the signal-to-noise ratio is low, and the length of utterance is short; such scenarios often arise in social robot interactions with humans.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.

Abstract

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Optimizing Integrated Features for Hindi Automatic Speech Recognition System
Mohit Dua ... Rajesh Kumar Aggarwal
Journal of Intelligent Systems | VOL. 29
Mohit Dua, et. al.Mohit Dua ... Rajesh Kumar Aggarwal
01 Oct 2018
Journal of Intelligent Systems | VOL. 29

An ASR system using MFCC and VQ/GMM with emphasis on environmental dependency
Bidhan Barai ... Subhadip Basu
-
Bidhan Barai, et. al.Bidhan Barai ... Subhadip Basu
01 Dec 2017
01 Dec 2017

Closed-Set Device-Independent Speaker Identification Using CNN
Tapas Chakraborty ... Nibaran Das
-
Tapas Chakraborty, et. al.Tapas Chakraborty ... Nibaran Das
01 Jan 2020
01 Jan 2020

Performance Analysis of Speaker Identification using Gaussian Mixture Model and Support Vector Machine
Aman Ranjan Verma ... S Premananda Singh
-
Aman Ranjan Verma, et. al.Aman Ranjan Verma ... S Premananda Singh
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.

Abstract

Talk to us

Similar Papers

More From: Sensors