Abstract

Berkeley Speech Technologies, Inc. has been developing commercial text to speech synthesis technology for over 10 yr. What started out as a quick “technology transfer” has grown to become a complex body of “intellectual property” that has been realized in such products as a 100 000‐word talking dictionary, a telephone response system with 16 T‐T‐S lines on one board, a satellite communication system for trucks, and a portable talking computer for blind users. Practical considerations caused modification of the initial theoretical assumptions. From the beginning, it was assumed that high intelligibility and high phoneme accuracy were essential, but it was soon learned that 700 words per minute with a 25‐ms start and stop are equally important for blind users. Similarly, academic research had assumed wide bandwidth and low noise, but telephone systems require that all of the speech information be packed into a 3.5‐kHz telephone bandwidth. Initially, the choice was made to use demi‐syllable synthesis because it seemed to be an “engineering shortcut” that might cover gaps in standard scientific descriptions. As the technology developed, however, the decision was made to convert to a more scientifically based synthesis model because it offered higher quality, greater flexibility, and faster development, especially of new languages. Our 10‐yr development could not have been justified on the basis of expected financial return. However, it was and is fun.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call