Abstract

The use of acoustic–prosodic features related to F0, duration and voice quality is proposed and evaluated for automatic extraction of paralinguistic information (intentions, attitudes, and emotions) in dialogue speech. Perceptual experiments and acoustic analyses were conducted for monosyllabic interjections spoken in several speaking styles, conveying a variety of paralinguistic information. Experimental results indicated that the classical prosodic features, i.e., F0 and duration, were effective for discriminating groups of paralinguistic information expressing intentions, such as affirm, deny, filler, and ask for repetition, and accounted for 57% of the global detection rate, in a task of discriminating seven groups of paralinguistic information. On the other hand, voice quality features were effective for identifying part of the paralinguistic information expressing emotions or attitudes, such as surprised, disgusted and admired, leading to a 12% improvement in the global detection rate.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.