Abstract

Current text-to-speech systems do not support the effective provision of the semantics and the cognitive aspects of the documents’ typographic cues (e.g., font type, style, and size). A novel approach is introduced for the acoustic rendition of text font based on the emotional analogy between the visual (text font cues) and the acoustic (speech prosody) modalities. The methodology is based on: a) modeling reader’s emotional state response (“Pleasure”, “Arousal” and “Dominance”) induced by the document’s font cues and b) the acoustic mapping of the emotional state using expressive speech synthesis. A case study was conducted for the proposed methodology by calculating the prosodic values on specific font cues (several font styles and font sizes) and by examining listeners’ preferences on the acoustic rendition of bold, italics, bold-italics, and various font sizes. The experimental results after the user evaluation indicate that the acoustic rendition of font size variations as well as bold and italics is recognized successfully, but bold-italics are confused with bold, due to the similarities of their prosodic variations.

Highlights

  • Written documents, either printed or electronic, include books, journals, newspapers, newsletters, gazettes, reports, letters, e-mails, and webpages

  • A) their results were based on stimuli that were different across cultures; b) possibly the differences are coming from the speech synthesizer itself; c) the underlying prosody was based on different speakers; and d) the translation of the sentences may have resulted in different semantics implying a different kind of appropriate emotion

  • 3.4 Modeling font size and type to expressive speech synthesis After a statistical survey conducted on a large number of text books and newspapers [63], we found that the most frequent font sizes that define the baseline of the documents are 10, 12, and 14pt

Read more

Summary

Introduction

Either printed or electronic, include books, journals, newspapers, newsletters, gazettes, reports, letters, e-mails, and webpages. Focusing on the typographic attributes and using the dimensional theory of emotions, Laarni [38] investigated the effects of color, font type/style on the “Pleasure”, “Arousal”, and “Dominance” scales according to the users’ preferences. He examined the impact of color on document aesthetics (e.g., combinations of red font on green background were rated as the most unpleasant and black on white were considered the least arousing). Based on the dimensional theory of emotions, a recent study [8] investigates how the typographic elements, like font style (bold, italics, and bold-italics) and font (type, size, color, and background color), affect the reader’s emotional states “Pleasure”, “Arousal”, and “Dominance” (PAD). Newer methodologies [48] optimize the existing ones or propose novel approaches such as expressivitybased selection of units, unit selection, and signal modification, as well as statistical parametric synthesis based on Hidden Markov Models [55]

Expressive speech synthesis: the dimensional approach
Prosodic mapping of typography
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call