Abstract

A singing‐voice synthesis method that can be transformed from a speaking voice into a singing voice using STRAIGHT is proposed. This method comprises three sections: the F0 control model, spectral sequence control model, and duration control model. These models were constructed by analyzing characteristics of each acoustical feature that affects singing‐voice perception through psychoacoustic experiments. The F0 control model generates a singing‐voice F0 contour through consideration of four F0 fluctuations: overshoot, vibrato, preparation, and fine (unsteady) fluctuation that affect the naturalness of a singing voice. The spectral sequence control model modifies the speaking‐voice spectral shape into a singing‐voice spectral shape by controlling a singer’s formant, which is a remarkable peak of a spectral envelope at around 3 kHz, and amplitude modulation of formants synchronized with vibrato. The duration control model stretches the speaking‐voice phoneme duration into a singing‐voice phoneme duration based on note duration. Results show that the proposed method can synthesize a natural singing voice, whose sound quality resembles that of an actual singing voice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call