Several studies have shown that the ability to identify the timbre of musical instruments is reduced in cochlear implant (CI) users compared with normal-hearing (NH) listeners. However, most of these studies have focused on tasks that require specific musical knowledge. In contrast, the present study investigates the perception of timbre by CI subjects using a multidimensional scaling (MDS) paradigm. The main objective was to investigate whether CI subjects use the same cues as NH listeners do to differentiate the timbre of musical instruments. Three groups of 10 NH subjects and one group of 10 CI subjects were asked to make dissimilarity judgments between pairs of instrumental sounds. The stimuli were 16 synthetic instrument tones spanning a wide range of instrument families. All sounds had the same fundamental frequency (261 Hz) and were balanced in loudness and in perceived duration before the experiment. One group of NH subjects listened to unprocessed stimuli. The other two groups of NH subjects listened to the same stimuli passed through a four-channel or an eight-channel noise vocoder, designed to simulate the signal processing performed by a real CI. Subjects were presented with all possible combinations of pairs of instruments and had to estimate, for each pair, the amount of dissimilarity between the two sounds. These estimates were used to construct dissimilarity matrices, which were further analyzed using an MDS model. The model output gave, for each subject group, an optimal graphical representation of the perceptual distances between stimuli (the so-called "timbre space"). For all groups, the first two dimensions of the timbre space were strikingly similar and correlated strongly with the logarithm of the attack time and with the center of gravity of the spectral envelope, respectively. The acoustic correlate of the third dimension differed across groups but only accounted for a small proportion of the variance explained by the MDS solution. Surprisingly, CI subjects and NH subjects listening to noise-vocoded simulations gave relatively more weight to the spectral envelope dimension and less weight to the attack-time dimension when making their judgments than NH subjects listening to unprocessed stimuli. One possible reason for the relatively higher salience of spectral envelope cues in real and simulated CIs may be that the degradation of local fine spectral details produced a more stable spectral envelope across the stimulus duration. The internal representation of musical timbre for isolated musical instrument sounds was found to be similar in NH and in CI listeners. This suggests that training procedures designed to improve timbre recognition in CIs will indeed train CI subjects to use the same cues as NH listeners. Furthermore, NH subjects listening to noise-vocoded sounds appear to be a good model of CI timbre perception as they show the same first two perceptual dimensions as CI subjects do and also exhibit a similar change in perceptual weights applied to these two dimensions. This last finding validates the use of simulations to evaluate and compare training procedures to improve timbre perception in CIs.
Read full abstract