Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

Md Rabiul Islam,Md Abdus Sobhan

doi:10.1155/2014/831830

Abstract

The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI) system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM) is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCCs) are combined to get the audio feature vectors and Active Shape Model (ASM) based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA) method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.

Highlights

Human speaker identification is bimodal in nature [1, 2]
If we have a problem in listening due to environmental noise, the visual information plays an important role for speech understanding [3]
It is true that audio-only speaker identification system is not sufficiently adequate to meet the variety of user requirements for person identification

Summary

Introduction

Human speaker identification is bimodal in nature [1, 2]. In a face-to-face conversation, we listen to what others say and at the same time observe their lip movements, facial expressions, and gestures. If we have a problem in listening due to environmental noise, the visual information plays an important role for speech understanding [3]. Visual speech information can play an important role in the improvement of natural and robust human-computer interaction [5, 6]. Fusion of audio and visual features is an important fusion strategy which can improve system performance of AVSI system. The subsequent sections of the paper focus on the proposed block diagram, feature extraction of the speech and facial features, fusion of multimodal audio and visual feature vectors, dimensionality reduction of multiple features, classification by using HMM, and performance analysis of the proposed AVSI system

Paradigm of the Proposed Audio-Visual Speaker Identification System

Audio Feature Extraction and Fusion

Visual Feature Extraction and Fusion

LPCC based speech feature based speech

Audio-Visual Feature Fusion and HMM Classification

Performance Analysis of the Proposed System

Findings

Conclusions and Observations

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Computational Intelligence and Soft Computing	Publication Date: Jan 1, 2014
Citations: 25	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Computational Intelligence and Soft Computing

Lead the way for us

Similar Papers

BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise
Md Rabiul Islam ... Md Abdus Sobhan
ISRN Artificial Intelligence | VOL. 2014
Md Rabiul Islam, et. al.Md Rabiul Islam ... Md Abdus Sobhan
08 Jan 2014
ISRN Artificial Intelligence | VOL. 2014

Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment
Rabiul Islam ... Fayzur Rahman
International Journal of Computer Applications | VOL. 9
Rabiul Islam, et. al.Rabiul Islam ... Fayzur Rahman
10 Nov 2010
International Journal of Computer Applications | VOL. 9

Singer identification using perceptual features and cepstral coefficients of an audio signal from Indian video songs
Tushar Ratanpara ... Narendra Patel
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2015
Tushar Ratanpara, et. al.Tushar Ratanpara ... Narendra Patel
25 Jun 2015
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2015

Speaker identification: A way to reduce call-sign confusion events
Sara Sekkate ... Abdellah Adib
-
Sara Sekkate, et. al.Sara Sekkate ... Abdellah Adib
01 May 2017
01 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Computational Intelligence and Soft Computing