A current controversy in the interactive voice response (IVR) community is whether and under which conditions designers should use recorded audio when portions of the interface must be generated by text -to-speech (TTS). The purpose of this study was to examine user preferences for a very extreme case—a prompt that incorporates multiple units of dynamic information in a single sentence. Two groups of IBM employees listened to and compared two auditory styles of information presentation (all information given by a single TTS voice and alternating recorded audio and the TTS voice.) The groups listened to both presentation styles in counterbalanced order and then indicated their preference and degree of preference. The percentage of respondents indicating a preference for the all TTS style was significantly greater than the percentage indicating a preference for the mixture of recorded and TTS.
Read full abstract