Phoneme Error Rate Research Articles

In this paper, we proposed a method of phoneme duration modeling for speech recognition. A phoneme with extremely short or long duration often causes a decline of performance of speech recognition. In order to improve performance of recognition, an estimation of phoneme duration determined by various parameters is required. However, there was no usual method of duration modeling for speech recognition considering the influence of both speaking‐rate and linguistic feature (phoneme location in sentence, part‐of‐speech, et al.), which influence phoneme duration strongly. Therefore, we modeled influence of speaking‐rate by two‐dimensional normal distribution of phoneme duration and local average of vowel duration. Each normal distribution is determined by tree‐based clustering with various questions, which include linguistic feature. With an experiment of estimation of phoneme duration by this model, we acquired 20.8% reduction of standard deviation of estimation error. We also used the proposed duration model for rescoring of N‐best hypothesis of speech recognition. With an experiment of rescoring of recognition results for spontaneous speech, we acquired significant reduction of 4.7% in phoneme error rate.

In this paper, a new method of syntactic analysis for speech recognition is presented. It is based on a novel algorithm which uses a formal model of grammatical structure to correct errors resulting from ambiguities not resolved by acoustic analysis of the continuous speech signal. The algorithm is unique in that it can correct not only misclassifications of symbols, substitution errors, but also those errors due to the presence of extra symbols and the omission of others, segmentation errors. It is proven that the algorithm is optimal in the sense of maximum likelihood and it is further shown that the algorithm is efficient, having a worst case execution time proportional to the length of the input and the square of an appropriate measure of the grammar size. An upper bound on the memory requirements is proven to the linear in both quantities. The algorithm has been tested on 496 Japanese phrases averaging 20 phonemes in length and spoken fluently by four male speakers. Based on input from an acoustic processor with an overall phoneme error rate of 40%, roughly equally divided between misclassifications and segmentation errors, a correct phrase recognition rate of 62% was obtained.

Phoneme Error Rate Research Articles

Related Topics

Articles published on Phoneme Error Rate

A phoneme duration model considering speaking‐rate and linguistic features for speech recognition

Segmental Intelligibility of Three Text-to-Speech Synthesis Methods in Reverberant Environments

The intelligibility of sentences in which multiple segments are replaced by noise bursts

Pitch dependent phone modelling for HMM-based speech recognition.

Maximum likelihood parsing of speech in the presence of segmentation errors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Phoneme Error Rate Research Articles

Related Topics

Articles published on Phoneme Error Rate

A phoneme duration model considering speaking‐rate and linguistic features for speech recognition

Segmental Intelligibility of Three Text-to-Speech Synthesis Methods in Reverberant Environments

The intelligibility of sentences in which multiple segments are replaced by noise bursts

Pitch dependent phone modelling for HMM-based speech recognition.

Maximum likelihood parsing of speech in the presence of segmentation errors