Audio-visual Speaker Identification Research Articles

This paper deals with a new and improved approach of Back-propagation learning neural network based likelihood ratio score fusion technique for audio-visual speaker Identification in various noisy environments. Different signal preprocessing and noise removing techniques have been used to process the speech utterance and LPC, LPCC, RCC, MFCC, ΔMFCC and ΔΔMFCC methods have been applied to extract the features from the audio signal. Active Shape Model has been used to extract the appearance and shape based facial features. To enhance the performance of the proposed system, appearance and shape based facial features are concatenated and Principal Component Analysis method has been used to reduce the dimension of the facial feature vector. The audio and visual feature vectors are then fed to Hidden Markov Model separately to find out the log-likelihood of each modality. The reliability of each modality has been calculated using reliability measurement method. Finally, these integrated likelihood ratios are fed to Back-propagation learning neural network algorithm to discover the final speaker identification result. For measuring the performance of the proposed system, three different databases, that is, NOIZEUS speech database, ORL face database and VALID audio-visual multimodal database have been used for audio-only, visual-only, and audio-visual speaker identification. To identify the accuracy of the proposed system with existing techniques under various noisy environment, different types of artificial noise have been added at various rates with audio and visual signal and performance being compared with different variations of audio and visual features.

Read full abstract

Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.

Read full abstract

Audio-visual Speaker Identification Research Articles

Related Topics

Articles published on Audio-visual Speaker Identification

BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise

Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

A Visual Signal Reliability for Robust Audio-Visual Speaker Identification

Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment

Likelihood Ratio Based Score Fusion for Audio-Visual Speaker Identification in Challenging Environment

Audio–visual speaker identification using dynamic facial movements and utterance phonetic content

Audio-visual speaker identification with asynchronous articulatory feature

Performance enhancement for audio-visual speaker identification using dynamic facial muscle model

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Audio-visual Speaker Identification Research Articles

Related Topics

Articles published on Audio-visual Speaker Identification

BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise

Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

A Visual Signal Reliability for Robust Audio-Visual Speaker Identification

Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment

Likelihood Ratio Based Score Fusion for Audio-Visual Speaker Identification in Challenging Environment

Audio–visual speaker identification using dynamic facial movements and utterance phonetic content

Audio-visual speaker identification with asynchronous articulatory feature

Performance enhancement for audio-visual speaker identification using dynamic facial muscle model