Influence of an internal reference system and cross-modality matching on the subjective rating of speech synthesizers

Chaslav V Pavlovic,Robert Espesser,Mario Rossi

doi:10.1121/1.405295

Chaslav V Pavlovic, Robert Espesser + Show 1 more

Open Access

https://doi.org/10.1121/1.405295

Copy DOI

Abstract

In previous studies it was concluded that contextual invariance and subject invariance of categorical and magnitude estimates of speech quality could be improved by introducing a reference system and by normalizing the results with respect to it. The reference signal used in the previous studies was natural speech. The use of such a reference system may present problems for applications where cross-language comparisons of synthesizers are made. In particular, this refers to the difficulty of ensuring equal subjective quality of different talkers in different languages. In this study the possibility of substituting an actual reference signal with an ‘‘internal’’ reference defined to the subject as the system of optimal quality is investigated. Another objective of this study is to explore whether a sometimes difficult task of free number production required in magnitude estimations could be replaced by cross-modality matches using lines of various lengths produced by subjects on a computer screen. The main concern here was related to the unknown effects of the limited width of the computer screen on the magnitude estimation task. [This research was made possible by Grant No. 2589 from the EEC Esprit SAM project.]

Full Text