An Experimental Study on Dynamic Features of Speech Structure

Shinya Shimizu,Nobuaki Minematsu,Keikichi Hirose,Masayuki Suzuki

doi:10.2299/jsp.16.319

Shinya Shimizu, Nobuaki Minematsu + Show 2 more

Open Access

https://doi.org/10.2299/jsp.16.319

Copy DOI

Journal: Journal of Signal Processing	Publication Date: Jan 1, 2012
Citations: 1	License type: free

Affiliation: The University of Tokyo

Abstract

One of the biggest difficultiesin automatic speech recognition (ASR) is how to deal with variations of speech signals caused by non-linguistic information, such as age, gender, etc. Various methods have been proposed to compensate for the variations and one of them is speech structure [1]. Speech structure, which extracts only contrastive features and discards absolute features, is proved to be transform-invariant mathematically and to be very robust with the non-linguistic variations experimentally [2]. Although the conventional speech structure extracts local and distant contrastive features, it did not extract dynamic features explicitly which are supposed to exist in the contrastive features. In this paper, we reformulate speech structure based on trajectory HMM and derive trajectory structure (TSR), in which dynamic and contrastive features can be defined and used in ASR. We carry out an experiment of n-best rescoring of isolated word recognition using trajectory structure and obtain 28.5% relative decrease in word error rate.

Full Text