Consonant Regions Research Articles

Background: Emotional speech synthesis is the process of synthesising emotions in a neutral speech – potentially generated by a text-to-speech system – to make an artificial humanmachine interaction human-like. It typically involves analysis and modification of speech parameters. Existing work on speech synthesis involving modification of prosody parameters does so at sentence, word, and syllable level. However, further fine-grained modification at vowel level has not been explored yet, thereby motivating our work. Objective: To explore prosody parameters at vowel level for emotion synthesis. Methods: Our work modifies prosody features (duration, pitch, and intensity) for emotion synthesis. Specifically, it modifies the duration parameter of vowel-like and pause regions and the pitch and intensity parameters of only vowel-like regions. The modification is gender specific using emotional speech templates stored in a database and done using Pitch Synchronous Overlap and Add (PSOLA) method. Results: Comparison was done with the existing work on prosody modification at sentence, word and syllable label on IITKGP-SEHSC database. Improvements of 8.14%, 13.56%, and 2.80% for emotions angry, happy, and fear respectively were obtained for the relative mean opinion score. This was due to: (1) prosody modification at vowel-level being more fine-grained than sentence, word, or syllable level and (2) prosody patterns not being generated for consonant regions because vocal cords do not vibrate during consonant production. Conclusion: Our proposed work shows that an emotional speech generated using prosody modification at vowel-level is more convincible than prosody modification at sentence, word and syllable level.

This paper proposes a method for duration (time scale) modification using glottal closure instants (GCI, also known as instants of significant excitation) and vowel onset points (VOP). In general, most of the time scale modification methods attempt to vary the duration of speech segments uniformly over all regions. But it is observed that consonant regions and transition regions between a consonant and the following vowel, and between two consonant regions do not vary appreciably with speaking rate. The proposed method implements the duration modification without changing the durations of the transition and consonant regions. Vowel onset points are used to identify the transition and consonant regions. A VOP is the instant at which the onset of the vowel takes place, which corresponds to the transition from a consonant to the following vowel in most cases. The VOPs are computed using the Hilbert envelope of linear prediction (LP) residual. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations, like the onset of burst, in the case of nonvoiced speech. Manipulation of duration is achieved by modifying the duration of the LP residual with the help of instants of significant excitation as pitch markers. The modified residual is used to excite the time-varying filter whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is found to be natural. Performance of the proposed method is compared with the method, where the duration of speech is modified uniformly over all regions. Samples of speech signals for different modification factors is available for listening at http://sit.iitkgp.ernet.in/~ksrao/result.html.

Consonant Regions Research Articles

Related Topics

Articles published on Consonant Regions

Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech

Effect of source filter interaction on isolated vowel-consonant-vowel perception.

Consonant-vowel unit recognition using dominant aperiodic and transition region detection

Class-level spectral features for emotion recognition

Duration modification using glottal closure instants and vowel onset points

Speakers nasalize /edh/ if it is preceded by /n/, but listeners don’t care—They still hear /edh/

The effects of coarticulation on Japanese voiceless consonant discrimination.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Consonant Regions Research Articles

Related Topics

Articles published on Consonant Regions

Synthesis of Emotional Speech by Prosody Modification of Vowel Segments of Neutral Speech

Effect of source filter interaction on isolated vowel-consonant-vowel perception.

Consonant-vowel unit recognition using dominant aperiodic and transition region detection

Class-level spectral features for emotion recognition

Duration modification using glottal closure instants and vowel onset points

Speakers nasalize /edh/ if it is preceded by /n/, but listeners don’t care—They still hear /edh/

The effects of coarticulation on Japanese voiceless consonant discrimination.