Abstract

In this paper, we describe a technique for evaluating the quality of mimicked speech. In other words, mimicry artists are evaluated based on their competences to mimic a particular person. This evaluation is done based on prosodic characteristics for the text dependent cases. Prosodic characteristics are represented using features derived from pitch contour, duration and energy. In this work, prosodic features are extracted from speech after automatically segmenting into intonational phrases. Pitch contour corresponding to each phrase is approximated using weighted sum of legendre polynomials. Prosodic feature set includes weights of first four legendre polynomials (w0k, w1k, w2k, w3k), average jitter, average shimmer, voiced duration, total duration and change in energy of each intonation phrase. The effectiveness of the technique is demonstrated using a text dependent database of mimicked speeches. Evaluation is done by dynamic time warping of prosodic features derived from the mimicked speech and the original speech. The scores obtained from this evaluation is compared with the results of manual perception/listening tests, which clearly indicate the effectiveness of the proposed technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.