Abstract

Modification of suprasegmental features such as pitch and duration of original speech by fixed scaling factors is referred to as static prosody modification. In dynamic prosody modification, the prosodic scaling factors (time-varying modification factors) are defined for all the pitch cycles present in the original speech. The present work is focused on improving the naturalness of the prosody modified speech by reducing the generation of piecewise constant segments in the modified pitch contour. The prosody modification is performed by anchoring around the accurate instants of significant excitation estimated from the original speech. The division of longer pitch intervals into many equal intervals over long speech segments introduces step-like discontinuities in the form of piecewise constant segments in the modified pitch contours. The effectiveness of proposed dynamic modification method is initially confirmed from the smooth modified pitch contour plot obtained for finer static prosody scaling factors, waveforms, spectrogram plots and comparison subjective evaluations. Also, the average $$F_0$$F0 jitter computed from the pitch segments of each glottal activity region in the modified speech is proposed as an objective measure for the prosody modification. The naturalness of the prosody modified speech using the proposed method is objectively and subjectively compared with that of the existing zero frequency filtered signal-based dynamic prosody modification. Also, the proposed algorithm effectively preserves the dynamics of the prosodic patterns in singing voices where in the $$F_0$$F0 parameters rapidly and continuously fluctuate within a higher $$F_0$$F0 range.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call