A Corticothalamic Circuit Model for Sound Identification in Complex Scenes

Gonzalo H Otazu,Christian Leibold

doi:10.1371/journal.pone.0024270

Abstract

The identification of the sound sources present in the environment is essential for the survival of many animals. However, these sounds are not presented in isolation, as natural scenes consist of a superposition of sounds originating from multiple sources. The identification of a source under these circumstances is a complex computational problem that is readily solved by most animals. We present a model of the thalamocortical circuit that performs level-invariant recognition of auditory objects in complex auditory scenes. The circuit identifies the objects present from a large dictionary of possible elements and operates reliably for real sound signals with multiple concurrently active sources. The key model assumption is that the activities of some cortical neurons encode the difference between the observed signal and an internal estimate. Reanalysis of awake auditory cortex recordings revealed neurons with patterns of activity corresponding to such an error signal.

Highlights

Auditory scenes are generally composed of sounds produced by multiple sources
We proposed a new algorithm, called Corrected Projections Algorithm (CPA), which identifies the sources present in a complex auditory scene
CPA belongs to a family of algorithms that identify the few elements from a large dictionary of possible sources that are used to reconstruct the signal

Summary

Introduction

Auditory scenes are generally composed of sounds produced by multiple sources. The observed complex auditory signal is a superposition of these sources, making the identification of the individual sound elements a non-trivial problem (Fig. 1A). While humans generally perform better than machines do in recognizing auditory objects in complex scenes, it is not yet known how our nervous system performs this task in real time. In one family of computer algorithms, the blind source separation algorithms (Fig. 1B), source elements are identified using only the information extracted from the observed signal. These approaches make no parametric assumptions about the superimposed signals in the auditory scene. Without such prior information, the amount of data necessary to identify the sources present in a scene is large, making them not compatible with the real time requirement for biological systems

Methods

Results

Conclusion