In the study of how we perceive musical timbre, we encounter a number of different representations of the musical information, several of which have appeared in preceding papers in the symposium. The first is the musical score, which in engineering terms is a graph of frequency versus time, with appended annotations about loudness, scoring, timing, etc. The next representation is sound in a concert hall, created from the score by an orchestra or computer: technically, sound pressure as a function of time. The representations of interest in this paper are those produced by the mechanical and neural systems in our heads in response to this sound. The First Auditory Representation Properties of the Ear Experiments involving auditory masking (see Patterson 1976), critical bands (summarized in Tobias 1970), basilar membrane motion (see Johnstone & Boyle 1967; Rhode 1971; & Evans & Wilson 1973) and tuning curves of primary auditory nerve fibers (see Kiang 1965 & 1974), all indicate that the first neural representation of sound in our heads results from a frequency analysis of the incoming sound by a fluid-filled bony structure called the cochlea. The frequency resolution of this system is somewhat less than one-third of an octave (above 400 Hz), as can be seen from Figure 1, the so-called critical bandwidth data derived from psychophysics. A corresponding set of psychophysical experiments indicate that the temporal resolution of the system is at best a few milliseconds (e.g., see Viemeister 1979). The phase sensitivity of the is still a contro versial topic, but most studies indicate that the system is only minimally sensitive to the relative phase of harmonics. Of course, the is exquisitely sensitive to the interaural phase of any sine wave component below 500 Hz. But this gives rise to auditory localization, and not perception of timbre. Hence in the remainder of this paper we will ignore phase. It may at first glance seem quite incongruous that the auditory system, which has excellent pitch discrimination (two or three hertz for 1000 Hz pure tones), should analyze sound with filters as broad as one third of an octave, that is, three or four notes on the chromatic scale. But it is easy to show that if one uses not just one filter, but several overlapping filters, then accurate information concerning pitch is available, limited only by the slope of the filter characteristic, and not the filter bandwidth. The Model To obtain a better idea of this first neural representation of music and speech, we have constructed a model ear with properties approximating those of the human discussed above. Our model, shown in Figure 2, consists basically of a bank of 1/3 octave filters, covering from 125 Hz to 6.3 kHz, followed by envelope detectors. To simulate the roughly constant critical bandwidth below 400 Hz, we added together the detector outputs of the 125- and 160-Hz channels, and also the 200- and 250-Hz channels (see Dockendorff 1978). The detector time constants were chosen to produce fast rise time consistent with low ripple. In filter systems such as this one which have wider band widths at higher frequencies, the rise time of the filters decrease with increasing cent er frequency. Hence we chose detector time constants to correspond, such that the overall rise times of the filter-detector units were inversely proportional to frequency. Specifically, the 1 kHz channel has an overall rise time of 6 milliseconds, the 2 kHz channel, 3 milliseconds, and so forth. As noted in Figure 2, the detectors are connected to a 16-channel CMOS multiplex switch, which samples the output of each channel every 1.6 milliseconds. (This rate is appropriate for the high-frequency channels, but oversamples the low channels.) The multiplexed output is then passed through a logarithmic amplifier to match the logarithmic nature of perceived loudness in the ear. There are several important aspects of the human auditory system that are not modeled by this system, such as two-tone inhibition, the limited dynamic range of the neural system, etc. …
Read full abstract