This study investigates autocorrelation-based features as a potential basis for phonetic and syllabic distinctions. The work comes out of a theory of auditory signal processing based on central monaural autocorrelation and binaural crosscorrelation representations. Correlation-based features are used to predict monaural and binaural perceptual attributes that are important for the architectural acoustic design of concert halls: pitch, timbre, loudness, duration, reverberation-related coloration, sound direction, apparent source width, and envelopment (Ando, 1985, 1998; Ando and Cariani, 2009). The current study investigates the use of features of monaural autocorrelation functions (ACFs) for representing phonetic elements (vowels), syllables (CV pairs), and phrases using a small set of temporal factors extracted from the short-term running ACF. These factors include listening level (loudness), zero-lag ACF peak width (spectral tilt), τ1 (voice pitch period), φ1 (voice pitch strength), τe (effective duration of the ACF envelope, temporal repetitive continuity/contrast), segment duration, and Δφ1/Δt (the rate of pitch strength change, related to voice pitch attack-decay dynamics). Times at which ACF effective duration τe is minimal reflect rapid signal pattern changes that usefully demarcate segmental boundaries. Results suggest that vowels, CV syllables, and phrases can be distinguished on the basis of this ACF-derived feature set.
Read full abstract