EXTENDED ABSTRACT: Gestural control of speech or singing synthesis is difficult, because of the very fast articulators motions encountered in speech and singing. For singing, another difficult question is accurate rhythmic coordination and precision, because syllables must coincide with musical beats, at a given tempo. Then, the precise location of beats for different syllables must be controlled. Two singing synthesis instruments are presented: Digitalis Cantor and Digitartic. Both instruments use bimanual writing or drawing gestures on graphic tablets. The voice signal is computed with the help of a parametric synthesizer, including a voice source model, consonantal noise models and series/parallel formant filters. Cantor Digitalis is a vowel and semi-vowel singing instrument. Digitartic is an extension of Cantor Digitalis and allows for singing syllables, including plosives, fricative, liquid and nasal consonants. Any in-between canonical place of articulation is possible by linear interpolation of the consonant parameters, for each mode of articulation. In this paper, the focus is given on Digitartic through the issue of consonant gestures and musical beat synchronization. Three modes of Vowel-Consonant-Vowel (VCV) articulation are discussed according to three levels of rhythmic precision and musical context. A VCV articulation is composed of the onset phase (articulators approaching the position of maximum constriction), the medial phase (maximum of constriction) and the offset phase (constriction release). Offset phase of plosives is very short compared to other consonants. The first control mode consists of triggering the syllable at the beginning of the onset phase. However, when the syllable starts on the musical beat, it is perceived with a delay depending on the duration of articulation phases. Anticipating precisely this delay is very difficult. The second control mode of control allows for triggering the VCV dissyllable in two steps. In the first step, the onset phase is triggered, and in the second step, the offset phase is triggered. In this way, plosives can be synchronized with musical beats without any delay. The third control mode is a continuous control of the phases of articulation, without any triggering. This requires a fast synthesis engine, a high interface sampling rate, as well as an expert control gesture, fast and precise enough to reproduce speech articulation phases. The continuous control mode of articulation is performed by a back-and-forth gesture with the pen of the non-preferred hand, along the vertical dimension of the graphic tablet. Place of articulation is continuously controlled along the horizontal dimension, and the mode of articulation is assigned to different areas on the tablet. This back-and-forth gesture is analog to the somewhat symmetric articulation of the VCV dissyllable. The gesture amplitude allows for different degrees of articulation (hypoarticulation to hyperarticulation). Controlling durations of each phase of articulation is another mean to increase expressiveness. The preferred hand is controlling pitch, vocal effort and vowel quality on another graphic tablet. Then it is possible to modify pitch and vocal effort during each phase of articulation. Cantor Digitalis and Digitartic allow for expressive musical performances. They are regularly used for concerts.
Read full abstract