To provide an acoustic model which can absorb the fluctuation of a phonemic environment in an interval longer than a syllable, with the number of parameters of the acoustic model suppressed to be small, a phoneme-connected syllable HMM/syllable-connected HMM set is generated in such a way that a phoneme-connected syllable HMM set corresponding to individual syllables is generated by combining phoneme HMMs. A preliminary experiment is conducted using the phoneme-connected syllable HMM set and training speech data. Any misrecognized syllable and the preceding syllable of the misrecognized syllable are checked using results of a preliminary experiment syllable label data. The combination between a correct answer syllable for the misrecognized syllable and the preceding syllable of the misrecognized syllable is extracted as a syllable connection. A syllable-connected HMM corresponding to this syllable connection is added into the phoneme-connected syllable HMM set. The resulting phoneme-connected syllable HMM set is trained using the training speech data and the syllable label data.
Read full abstract