Abstract
Many voice conversion algorithms are based on frame-wise mapping from source features into target features. This ignores the inherent temporal continuity that is present in speech and can degrade the subjective quality. In this paper, we propose to optimize the speech feature sequence after a frame-based conversion algorithm has been applied. In particular, we select the sequence of speech features through the minimization of a cost functionthatinvolvesboththeconversionerrorandthesmoothness of the sequence. The estimation problem is solved using sequential MonteCarlo methods. Both subjectiveand objective results show the effectiveness of the method. Index Terms: voice conversion, maximum a posteriori,Viterbi algorithm, smoothing, particlefilter
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have