Abstract

This paper describes an experimental interactive system featuring: (1) highly accurate speaker independent and large vocabulary speech recognition based on context-dependent accurate acoustic phoneme HMM models trained with speech data from more than 10000 speakers collected over a telephone network; (2) high quality text-to-speech synthesis that generates speech by concatenating triphone-context-dependent waveform segments; (3) software-based configuration that requires no special hardware except a PC equipped with a sound board and a voice modem; and (4) easy and rapid prototyping which enables the developer to build a system by writing two types of service scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call