Abstract

When we hear a new voice on the radio, we can tell whether the speaker is an adult or a child. We can also extract the message of the communication without being confused by the size information. This shows that auditory signal processing is scale invariant, automatically segregating information about vocal tract shape from information about vocal tract length. Patterson and colleagues have performed a series of experiments to measure the characteristics of size/shape perception [e.g., Smith et al., J. Acoust. Soc. Am. 117(1), 305–318 (2005)], and provided a mathematical basis for auditory scale invariance in the form of the stabilized wavelet-Mellin transform (SWMT) [Irino and Patterson, Speech Commun. 36(3–4), 181–203 (2002)]. The mathematics of the SWMT dictates the optimal form of the auditory filter, insofar as it must satisfy minimal uncertainty in a time-scale representation [Irino and Patterson, J. Acoust. Soc. Am. 101(1), 412–419 (1997)]. The resulting gammachirp auditory filter is an asymmetric extension of the earlier gammatone auditory filter—one which can explain the level dependence of notched-noise masking. Thus, although it is not immediately intuitive, speaker size perception and auditory filter shape are both aspects of a larger, unified framework for auditory signal processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call