Abstract
Aiming at natural F 0 control for conversational speech synthesis using attributes of constituent output words, F 0 characteristics are analyzed from both generation and perception viewpoints. We recorded commonly used two-phrase utterances consisting of Japanese adjective and adverb phrases expressing different degree of markedness under designed conversational situations, and compared their F 0 characteristics. The comparison showed the consistent F 0 control dependencies not only on adverbs themselves but also on the attribute of following adjective phrases. Strong positive or negative correlation is observed between the markedness of adverbs and F 0 height when an adjective phrase showing positiveness or negativeness is followed to the current adverb phrase. These consistencies have been perceptually confirmed by naturalness evaluation tests using the same two-phrase samples with different F 0 heights. Finally, a computational model of conversational F 0 control is proposed using lexical information of adjectives showing positiveness or negativeness and adverbs expressing markedness. F 0 estimation experiments quantitatively showed the possibility of F 0 control for natural conversational speech synthesis using the attribute of constituent output words.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.