Integrated Phoneme Subspace Method for Speech Feature Extraction

Hyunsin Park,Yasuo Ariki,Tetsuya Takiguchi

doi:10.1155/2009/690451

Hyunsin Park, Yasuo Ariki + Show 1 more

Open Access

https://doi.org/10.1155/2009/690451

Copy DOI

Abstract

Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequency filter bank domain. Transformations are based on principal component analysis (PCA), independent component analysis (ICA), and linear discriminant analysis (LDA). Furthermore, this paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and reconstructs feature space for representing phonemic information efficiently. The proposed speech feature vector is generated by projecting an observed vector onto an integrated phoneme subspace (IPS) based on PCA or ICA. The performance of the new feature was evaluated for isolated word speech recognition. The proposed method provided higher recognition accuracy than conventional methods in clean and reverberant environments.

Highlights

In the case of distant speech recognition, system performance decreases sharply due to the effects of reverberation
Vowel /o/ has the largest (10) dimension and consonant / p/ the smallest (2) dimension. This trend means that phoneme subspaces have correlated information between each other
We proposed the new speech feature extraction method which emphasizes the phonemic information from observed speech using Principal Component Analysis (PCA), the Minimum Description Length (MDL) principle, and Independent Component Analysis (ICA)

Summary

Introduction

In the case of distant (hands-free) speech recognition, system performance decreases sharply due to the effects of reverberation. To solve this problem, there have been many studies carried out on feature extraction, model adaptation, and decoding. Our proposed method focuses on the feature extraction domain. The Mel-Frequency Cepstrum Coefficient (MFCC) is a widely used speech feature. Since the feature space of a MFCC obtained using Discrete Cosine Transform (DCT) is not directly dependent on speech data, the observed signal with noise does not show good performance without utilizing noise suppression methods. There are other methods for feature extraction: RASTA-PLP [1, 2], normalization [3, 4], Principal Component Analysis (PCA) [5,6,7], Independent Component Analysis (ICA) [8, 9], and Linear Discriminant Analysis (LDA) [10] based methods

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Jan 1, 2009
Citations: 22	License type: cc-by

R Discovery Prime

R Discovery Prime

Integrated Phoneme Subspace Method for Speech Feature Extraction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Performance analysis of feature extraction and classification techniques in CBIR
D Jeyabharathi ... A Suruliandi
-
D Jeyabharathi, et. al.D Jeyabharathi ... A Suruliandi
01 Mar 2013
01 Mar 2013

R-Peak Detection Using Chaos Analysis in Standard and Real Time ECG Databases
V Gupta ... V Mittal
IRBM | VOL. 40
V Gupta, et. al.V Gupta ... V Mittal
24 Oct 2019
IRBM | VOL. 40

A new approach to face recognition based on features fusion
Yanhong Fu
-
Yanhong FuYanhong Fu
01 Jan 2015
01 Jan 2015

Comparing Feature Extraction Techniques and Classifiers in the Handwritten Letters Classification Problem
Antonio García-Manso ... Miguel Macías-Macías
-
Antonio García-Manso, et. al.Antonio García-Manso ... Miguel Macías-Macías
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrated Phoneme Subspace Method for Speech Feature Extraction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing