Abstract

Previous attempts at synthesizing voiced fricatives have failed to yield acceptably natural‐sounding segments. One possible reason for this failure is that the synthesis models have been global, whereas some important characteristics of these segments may reside in their period‐by‐period structure. Our goal in this study was therefore to compare the auditory quality of digitally manipulated /v, ð, z, and ȝ/ segments, in various V‐V contexts. High‐quality tokens were digitized at 24 kHz, and individual pitch periods were marked by hand using interactive software. Comparison stimuli were then created via the following manipulations: (1) reordering of alternate periods; (2) reordering triples of periods; (3) replacement of all odd‐ (even‐) numbered periods by their even‐ (odd‐) numbered neighbors; (4) similar replacements modulo 3; (5) random reordering of periods. Furthermore, these manipulations were sometimes restricted to the onset, steady state, or offset portions of the segments. These digitally spliced segments were then presented to listeners for discrimination, naturalness, and likeness judgments. Results of these comparisons and their implications for synthesis of natural‐sounding voiced fricatives will be discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call