Abstract

Auditory scene analysis (ASA) refers to the process (es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA.

Highlights

  • In most situations, we receive sounds from an unknown number of different sources

  • Models based on Bayesian principles view auditory scene analysis (ASA) as a process assigning each segment of the input to one of the possible classes

  • The decision process ensures that the probability that the assigned class generated the segment is optimal, given the priors and the sound input

Read more

Summary

Introduction

We receive sounds from an unknown number of different sources. The task of the auditory system is to parse the complex mixture in order to determine the likely sources of the incoming signals. Akram et al (2014b) model is based on a variant of the temporal coherence model (which will be reviewed in full) aiming to test how an external attentional cue helps to select a single sound stream from a complex scene, whereas the model of Boes et al (2011) determines the direction of possible sound sources, which could be used to direct attention, but does not attempt to group or segregate objects.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.