Abstract
In this paper we study the effectiveness of prosodic features for speaker verification. We hypothesize that prosody is linked to linguistic units such as syllables and prosodic features can be better represented with reference to the syllabic sequence. For extracting prosodic features, speech is segmented into syllablelike regions using the knowledge of vowel onset points (VOP). We use a technique based on excitation source information to detect VOPs automatically. The location of VOPs serve as reference for extracting prosodic features directly from speech signal. Various parameters are used to represent the pitch and energy dynamics of the region between two consecutive VOPs. The effectiveness of the derived prosodic features for speaker verification is demonstrated on NIST SRE 2003 extended data. The complementary nature of prosodic features and spectral features help to improve the accuracy of the combined speaker verification system. Index Terms: prosody, speaker verification, syllable, vowel onset point, F0 contour.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.