ASR and TTS telecommunications applications in Japan

Mikio Kitai,Kazuo Hakoda,Shigeki Sagayama,Tomokazu Yamada,Hajime Tsukada,Satoshi Takahashi,Yoshiaki Noda,Jun-Ichi Takahashi,Yuki Yoshida,Kazuhiro Arai,Takashi Imoto,Tomohisa Hirokawa

doi:10.1016/s0167-6393(97)00044-7

Abstract

This paper first describes recent trends of ASR and TTS telecommunications applications in Japan. ASR applications focus on public services such as operator automation, operator assistance, voice-activated information retrieval, and voice dialing. Major TTS applications include information service by voice and e-mail reading. The usage of ASR and TTS functions is expected to dramatically increase in the near future with the penetration of handy and mobile telephone terminals; hot topics are text broadcasting and digital communication. Secondly this paper describes NTT's experimental interactive system featuring (1) highly accurate speaker independent and large vocabulary speech recognition based on context-dependent accurate acoustic phoneme HMM models trained with speech data from more than 10,000 speakers collected over telephone network, (2) high quality text-to-speech synthesis that generates speech by concatenating triphone-context-dependent waveform segments, (3) software-based configuration that requires no special hardware except a PC equipped with a sound board and a voice modem, and (4) easy and rapid prototyping which enables the developer to build a system by writing some types of service scenarios.

Full Text