Adaptive estimation of time-varying features from high-pitched speech based on an excitation source HMM

Akira Sasou,Kazuyo Tanaka

doi:10.21437/icslp.2002-590

Abstract

ABSTRACTThis paper describes a method of extracting time-varyingfeatures that is effective for speech signals with high funda-mental frequencies. The proposed method adopts a speechproduction model that consists of a Time-Varying Auto-Regressive (TVAR) process for an articulatory ﬁlter and aHidden Markov Model (HMM) for an excitation source.The model represents waveform amplitude variations bytime-varying gain of the excitation source. The proposed algo-rithm is given by extending a Viterbi algorithm so that theproposed algorithm can adaptively estimate TVAR coefﬁ-cients and time-varying gain with decoding the state tran-sition of the excitation source HMM. We applied the pro-posed method to extracting time-varying features from bothsynthetic and natural speech, and conﬁrmed its feasibility.1. INTRODUCTIONThe conventional Linear Prediction (LP) method is widelyused to analyze speech signals[1]. However, several prob-lems still remain to be solved[2]. One such problem is thatlocal peaks of the LP spectral estimate are strongly biasedtoward the harmonics, especially for high-pitched speech.Several methods have been designed to overcome this prob-lem [3, 4, 5, 6]. The authors have previously indicated thatan analysis method based on a speech production modelconsisting of an Auto-Regressive (AR) process for an artic-ulatoryﬁlterand a HiddenMarkov Model(HMM)for an ex-citationsource is robustforhighfundamental frequencies[7,8]. However, this method is not suitable for analyzing con-tinuous speech for the following reasons. First, the AR co-efﬁcients and HMM are iteratively estimated within everyanalysis frame, so a large number of operations is needed.Second, the analysis frame size needs to be large in orderto guarantee stable learning of the excitation source HMM.Third, the model parameters are assumed to be constantwithin the analysis frame, so the resulting parameters areaveraged within such a long analysis frame. This makesit difﬁcult to extract the dynamic characteristics of speechwhen features change rapidly, like in a singing voice.In this paper, we extend the speech production model in[7] so that the proposed model can represent time-varyingfeatures of continuous speech. We also describe an anal-ysis method that adaptively estimates Time-Varying Auto-Regressive (TVAR) coefﬁcients and gain based on the newmodel. The proposed method can substantially reduce thenumber of operations by applying the learned HMM andcan also extract dynamic characteristics of continuous speechby estimating those time-varying features adaptively.2. SPEECH PRODUCTION MODEL BASED ONTVAR-HMMThe proposed method adopts a speech production modelthat consists of a TVAR process for an articulatory ﬁlterand an HMM for an excitation source. The nodes of theHMM are concatenated in a ring state in order to representperiodicity of voiced sounds. We have previously shownthat LP analysis incorporating the excitation source HMMcan precisely estimate the characteristics of both vocal tractand excitation source from high-pitched speech signal[7, 8].The proposed model represents the time-varying features ofnot only the vocal tract but also the waveform amplitude bymultiplying an excitation source emitted from the HMM bya time-varying gain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive estimation of time-varying features from high-pitched speech based on an excitation source HMM

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Nonstationary signal reconstruction from TVAR coefficients
Atahur Rahman Najeeb ... Teddy Surya Gunawan
-
Atahur Rahman Najeeb, et. al.Atahur Rahman Najeeb ... Teddy Surya Gunawan
01 Nov 2017
01 Nov 2017

Bayesian approach to parameter estimation and interpolation of time-varying autoregressive processes using the Gibbs sampler
J.J Rajan ... P.J.W Rayner
IEE Proceedings - Vision, Image, and Signal Processing | VOL. 144
J.J Rajan, et. al.J.J Rajan ... P.J.W Rayner
01 Jan 1997
IEE Proceedings - Vision, Image, and Signal Processing | VOL. 144

Nonstationary autoregressive contour modeling approach for planar shape analysis
Kie B Eom
Optical Engineering | VOL. 38
Kie B EomKie B Eom
01 Nov 1999
Optical Engineering | VOL. 38

Generalized feature extraction for time-varying autoregressive models
J.J Rajan ... P.J.W Rayner
IEEE Transactions on Signal Processing | VOL. 44
J.J Rajan, et. al.J.J Rajan ... P.J.W Rayner
01 Jan 1996
IEEE Transactions on Signal Processing | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive estimation of time-varying features from high-pitched speech based on an excitation source HMM

Abstract

Talk to us

Similar Papers