Abstract

For aligning non-uniform time series, the traditional DTW algorithm is commonly used. However, in emotional voice conversion, traditional DTW algorithms tend to over-align and have poor local matching for aligning non-uniform parallel emotional speech sequences. This results in the emotional speech alignment data obtained affecting the quality and naturalness of the final emotional conversion. To address this issue, we propose an emotion speech sequence alignment algorithm based on ShapeDTW++. Our new algorithm improves on the DTW algorithm by incorporating shape descriptors from the ShapeDTW algorithm, which is a significant breakthrough in the popular DTW algorithm and can increase local matching. We also add cumulative distance loss weight and relaxation endpoints to reduce over-alignment of emotional speech sequences. Finally, the ShapeDTW++ algorithm provides performance similar to the ShapeDTW algorithm while increasing local matching and reducing over-alignment when aligning emotional speech sequences. This improvement in the objective evaluation of the ShapeDTW++ algorithm for aligning emotional speech sequences compared to traditional DTW algorithms is significant.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call