Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such as electric bass guitar (EBG), shares acoustical similarities with speech. We propose an attack-sustain (AS) representation of the playing technique to take advantage of this similarity. The AS representation treats the attack segment as an unvoiced consonant and the sustain segment as a voiced vowel. In addition, we propose a MISS framework for an EBG that can control its playing techniques: (1) we constructed a EBG sound database containing a rich set of playing techniques, (2) we developed a dynamic time warping and timbre conversion to align the sounds and AS labels, (3) we extend an existing MISS framework to control playing techniques using AS representation as control symbols. The experimental evaluation suggests that our AS representation effectively controls the playing techniques and improves the naturalness of the synthetic sound.
Read full abstract