Spectral modification technique in conversion of musical notes or tempos for singing voice synthesis system

Hideki Banno,Fumitada Itakura,Kumi Ohta,Masato Kawazoe

doi:10.1121/1.4787136

Abstract

The STRAIGHT time‐frequency representations (spectrograms) of singing voice signals in various musical notes and various tempos are observed to develop a high‐quality synthesis system of singing voice. The spectrogram of STRAIGHT, which is a very high‐quality analysis‐synthesis system, can represent the vocal tract information accurately. A conversion system of a musical note or a tempo of an input singing voice signal has been implemented based on the observation. As a result of the observation, the frequency warping of the STRAIGHT spectrogram based on a DP matching algorithm has been introduced into the system. It was found that the method using a differential of a smoothed spectrum as a spectral distance measure in the frequency warping produces subjectively better quality than that using a smoothed spectrum directly. It was also found that the method without spectral modification, i.e., only with pitch/tempo modification in the conversion, produces better quality than that using a differential of a smoothed spectrum. This can be caused by the destruction of naturalness in the method using a differential of a smoothed spectrum.

Full Text