Speaker-independent phoneme alignment using transition-dependent states

John-Paul Hosom

doi:10.1016/j.specom.2008.11.003

Abstract

Determining the location of phonemes is important to a number of speech applications, including training of automatic speech recognition systems, building text-to-speech systems, and research on human speech processing. Agreement of humans on the location of phonemes is, on average, 93.78% within 20 ms on a variety of corpora, and 93.49% within 20 ms on the TIMIT corpus. We describe a baseline forced-alignment system and a proposed system with several modifications to this baseline. Modifications include the addition of energy-based features to the standard cepstral feature set, the use of probabilities of a state transition given an observation, and the computation of probabilities of distinctive phonetic features instead of phoneme-level probabilities. Performance of the baseline system on the test partition of the TIMIT corpus is 91.48% within 20 ms, and performance of the proposed system on this corpus is 93.36% within 20 ms. The results of the proposed system are a 22% relative reduction in error over the baseline system, and a 14% reduction in error over results from a non-HMM alignment system. This result of 93.36% agreement is the best known reported result on the TIMIT corpus.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speaker-independent phoneme alignment using transition-dependent states

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Nov 27, 2008
Citations: 98

Similar Papers

A trust region based optimization for maximum mutual information estimation of HMMS in speech recognition
Zhi-Jie Yan ... Hui Jiang
-
Zhi-Jie Yan, et. al.Zhi-Jie Yan ... Hui Jiang
01 Apr 2009
01 Apr 2009

Speech recognition modeling advances for mobile voice search
Enrico Bocchieri ... Dimitrios Dimitriadis
-
Enrico Bocchieri, et. al.Enrico Bocchieri ... Dimitrios Dimitriadis
01 May 2011
01 May 2011

Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody
Alexandros Lazaridis ... Philip N Garner
-
Alexandros Lazaridis, et. al.Alexandros Lazaridis ... Philip N Garner
08 Sep 2016
08 Sep 2016

The SuperSID project: exploiting high-level information for high-accuracy speaker recognition
D Reynolds ... A Adami
-
D Reynolds, et. al.D Reynolds ... A Adami
06 Apr 2003
06 Apr 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker-independent phoneme alignment using transition-dependent states

Abstract

Talk to us

Similar Papers

More From: Speech Communication