Abstract

Humans and vocal animals use vocalizations to communicate with members of their species. A necessary function of auditory perception is to generalize across the high variability inherent in vocalization production and classify them into behaviorally distinct categories (‘words’ or ‘call types’). Here, we demonstrate that detecting mid-level features in calls achieves production-invariant classification. Starting from randomly chosen marmoset call features, we use a greedy search algorithm to determine the most informative and least redundant features necessary for call classification. High classification performance is achieved using only 10–20 features per call type. Predictions of tuning properties of putative feature-selective neurons accurately match some observed auditory cortical responses. This feature-based approach also succeeds for call categorization in other species, and for other complex classification tasks such as caller identification. Our results suggest that high-level neural representations of sounds are based on task-dependent features optimized for specific computational goals.

Highlights

  • Humans and vocal animals use vocalizations to communicate with members of their species

  • The behavioral salience of calls for marmosets[4,5,6,7,8], and the increasing resources allocated to the processing of calls along the cortical processing hierarchy[16], suggest that call processing is a computational goal of auditory cortex

  • We start with the premise that the first step in call processing is the categorization of calls into discrete call types, generalizing across the production variability that is inherent to calls

Read more

Summary

Introduction

Humans and vocal animals use vocalizations to communicate with members of their species. Predictions of tuning properties of putative feature-selective neurons accurately match some observed auditory cortical responses This feature-based approach succeeds for call categorization in other species, and for other complex classification tasks such as caller identification. Face detection algorithms use combinations of mid-level features, such as regions with specific contrast relationships[13,14], or combinations of face parts[12], to accomplish classification Of these algorithms, the one proposed by Ullman et al.[12] is especially interesting because of its potential to generalize to other classification tasks across sensory modalities. The one proposed by Ullman et al.[12] is especially interesting because of its potential to generalize to other classification tasks across sensory modalities In this algorithm, starting from a set of random fragments of faces, the authors used greedy search to extract the most informative fragments that were highly conserved across all faces despite within-class variability. A number of studies have described call-selective responses at various stages of the auditory pathway, there has Normalized power Freq (kHz)

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.