Abstract

In environments with many concurrent sound sources, it is often possible to hear out individual sounds and identify their sources. To accomplish this perceptual organization of the sound sources, listeners analyze the mixture of sounds to estimate the constituent components that correspond to one acoustic sound source. Signal components that are likely to come from the same source are grouped into an auditory object, that is segregated from the acoustic complex. The grouping of signal components into sources is based on identifying shared features of the components, like common spectral cues (the same fundamental frequency or frequency modulation), temporal cues (the same temporal envelope), or spatial cues (the same interaural differences). When signal components’ onsets fall within such a time window that they are perceived as being synchronous, the components are combined to form a new auditory object, that is monitored across time on cues that sustain their grouping. Given that spatial cues are inherent to every sound and already present at the onset, and that they enhance detection of a sound by a spatial release from masking, it is unclear whether and how much temporal perception depends on the presence of spatial cues. The research presented here addressed the contribution of the binaural hearing system, which processes a sound’s spatial cues, to the processing of its temporal cues, and the contribution of the sounds’ temporal cues to the processing of their spatial cues. In order to assess the contribution of spatial cues to temporal perception, experiments were conducted in which the temporal position of a target sound’s onset was controlled by the subjects to find its desired position relative to a regular series of reference sound onsets. Manipulation of the target’s interaural differences, its sensation level, and the target and reference sounds’ durations allowed comparison of their effects on temporal perception of the target’s onset. The main finding was that temporal perception precision degraded below a critical level, depending on the absolute target and masker levels and the difference in spatial configuration between target and masker that facilitated perception at much lower signal-to-noise ratios. The implication of a sound’s signal-to-noise ratio getting below this critical level is that, when the temporal window for grouping signal components’ onsets is enlarged due to the degraded precision, there is an increased probability for signal components to be grouped incorrectly into an auditory object, or to be prevented from grouping into a new auditory object and remain part of the acoustic complex until other evidence is gathered for the existence of their source. This may be a contributing factor to the decrease of speech intelligibility in noisy environments, where precise temporal perception of transients is required for the processing of speech signals. In order to assess the contribution of temporal cues to binaural processing, thresholds were measured for discrimination between the spatial configurations of two sounds that had identical spectral range and onset and offset times, but differed in their temporal envelope structure and had equally-sized directional information to opposite sides of the medial plane. The measured minimum interaural differences in time or level at which the sounds’ spatial configurations could successfully be discriminated represented the ability to process binaural cues, and allowed comparison between the effects of the various manipulations of bandwidths, center frequencies, and temporal envelopes to the sounds. The main findings were that discrimination based on interaural level differences was good for all tested conditions (although absolute thresholds depended on the specific conditions), and that discrimination based on interaural time differences was only possible when the temporal envelopes of the sounds were sufficiently different. The binaural system must, therefore, have been capable of processing the interaural differences within short periods of time when one of the sounds was dominant in the monaural temporal envelopes of the combined signals at each ear. Analysis of the monaural temporal envelopes influences the ability to use the binaural information to separate individual signal components, and allow tracking across time of the signals components with identical spatial cues to form a perceptual unity, the auditory object. In conditions where discrimination was not possible, the temporal envelopes of the individual sounds were too similar and such periods did not occur systematically in the temporal envelopes of the combined sounds. This finding suggests that a monaural analysis of a composite signal’s temporal envelope is a prerequisite for the binaural processing of the constituent signal components’ interaural time differences. The implication is that, if the signals’ temporal envelopes are too similar, their signal components may not be correctly grouped into separate and individually lateralized auditory objects, but remain fused as a single object with an averaged spatial image.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call