Abstract

Two sets of Japanese speech samples were analyzed to clarify the effects of speaking styles on prosodic parameters. The first set of 4068 isolated word utterances consists of 308 different Japanese words uttered in seven different ways, i.e., normal, slow, fast, strong, weak, high, and low. The second set of 110 conversational utterances consists of 11 different Japanese sentences uttered in four different conversational styles, i.e., normal, tender, restless, and irritated, and in one normal reading style. These samples were uttered by two professional narrators and their prosodic parameters [fundamental frequency (F0), power and segmental duration] were compared. The following tendencies were found from the analysis: (1) For most ordinary speaking styles (all but “high”) F0 and power are varied together. Correlations of 0.81∼0.99 were measured in these cases. (2) F0 can, however, be controlled independent of power, when subjects were instructed to do so (e.g., speaking style: high). (3) The average values and the dynamic ranges of F0 patterns differ between speaking styles. To realize these tendencies for speech synthesis by rule, a F0 control model is proposed. Applying analysis by synthesis to all samples in the first set at a time, the system parameters of this model could be derived. It is confirmed that the F0 pattern can be systematically generated by the proposed model for various speaking styles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call