Abstract
Two sets of Japanese speech samples were analyzed to clarify the effects of speaking styles on prosodic parameters. The first set of 4068 isolated word utterances consists of 308 different Japanese words uttered in seven different ways, i.e., normal, slow, fast, strong, weak, high, and low. The second set of 110 conversational utterances consists of 11 different Japanese sentences uttered in four different conversational styles, i.e., normal, tender, restless, and irritated, and in one normal reading style. These samples were uttered by two professional narrators and their prosodic parameters [fundamental frequency (F0), power and segmental duration] were compared. The following tendencies were found from the analysis: (1) For most ordinary speaking styles (all but “high”) F0 and power are varied together. Correlations of 0.81∼0.99 were measured in these cases. (2) F0 can, however, be controlled independent of power, when subjects were instructed to do so (e.g., speaking style: high). (3) The average values and the dynamic ranges of F0 patterns differ between speaking styles. To realize these tendencies for speech synthesis by rule, a F0 control model is proposed. Applying analysis by synthesis to all samples in the first set at a time, the system parameters of this model could be derived. It is confirmed that the F0 pattern can be systematically generated by the proposed model for various speaking styles.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.