Subjective Evaluation of Techniques for Proper Name Pronunciation

R.I Damper,T Soonklang

doi:10.1109/tasl.2007.904192

Abstract

Automatic pronunciation of unknown words of English is a hard problem of great importance in speech technology. Proper names constitute an especially difficult class of words to pronounce because of their variable origin and uncertain degree of assimilation of foreign names to the conventions of the local speech community. In this paper, we compare four different methods of proper name pronunciation for English text-to-speech (TTS) synthesis. The first (intended to be used as the primary strategy in a practical TTS system) uses a set of manually supplied pronunciations, referred to as the ldquodictionaryrdquo pronunciations. The remainder are pronunciations obtained from three different data-driven approaches (intended as candidates for the back-up strategy in a real system) which use the dictionary of ldquoknownrdquo proper names to infer pronunciations for unknown names. These are: pronunciation by analogy (PbA), a decision tree method (CART), and a table look-up method (TLU). To assess the acceptability of the pronunciations to potential users of a TTS system, subjective evaluation was carried out, in which 24 listeners rated 1200 synthesized pronunciations of 600 names by the four methods using a five-point (opinion score) scale. From over 50 000 proper names and their pronunciations, 150 so-called one-of-a-kind pronunciations were selected for each of the four methods (600 in total). A one-of-a-kind pronunciation is one for which one of the four methods disagrees with the other three methods, which agree among themselves. Listener opinions on one-of-a-kind pronunciations are argued to be a good measure of the overall quality of a particular method. For each one-of-a-kind pronunciation, there is a corresponding so-called rest pronunciation (another 600 in total), on which the remaining three competitor methods agree, for which listener opinions are taken to be indicative of the general quality of the competition. Nonparametric tests of significance of mean opinion scores show that the dictionary pronunciations are rated superior to the automatically inferred pronunciations with little difference between the data-driven methods for the one-of-a-kind pronunciations, but for the rest pronunciations there is suggestive evidence that PbA is superior to both CART and TLU, which perform at approximately the same level.

Full Text