Auditory Sketches: Very Sparse Representations of Sounds Are Still Recognizable.

Vincent Isnard,Clara Suied,Marine Taffou,Isabelle Viaud-Delmon

doi:10.1371/journal.pone.0150313

Abstract

Sounds in our environment like voices, animal calls or musical instruments are easily recognized by human listeners. Understanding the key features underlying this robust sound recognition is an important question in auditory science. Here, we studied the recognition by human listeners of new classes of sounds: acoustic and auditory sketches, sounds that are severely impoverished but still recognizable. Starting from a time-frequency representation, a sketch is obtained by keeping only sparse elements of the original signal, here, by means of a simple peak-picking algorithm. Two time-frequency representations were compared: a biologically grounded one, the auditory spectrogram, which simulates peripheral auditory filtering, and a simple acoustic spectrogram, based on a Fourier transform. Three degrees of sparsity were also investigated. Listeners were asked to recognize the category to which a sketch sound belongs: singing voices, bird calls, musical instruments, and vehicle engine noises. Results showed that, with the exception of voice sounds, very sparse representations of sounds (10 features, or energy peaks, per second) could be recognized above chance. No clear differences could be observed between the acoustic and the auditory sketches. For the voice sounds, however, a completely different pattern of results emerged, with at-chance or even below-chance recognition performances, suggesting that the important features of the voice, whatever they are, were removed by the sketch process. Overall, these perceptual results were well correlated with a model of auditory distances, based on spectro-temporal excitation patterns (STEPs). This study confirms the potential of these new classes of sounds, acoustic and auditory sketches, to study sound recognition.

Highlights

Human listeners can apparently recognize very and with no effort very diverse sound sources in their surrounding environment, the literature focusing on the recognition of natural sounds and on the features used by the listeners to recognize them is relatively scant (e.g. [1,2])
Moerel et al [14] found that the voices and speech regions responded to low-level features, with a bias toward low-frequencies that are characteristic of the human voices. This result is coherent with the theoretical approach proposed by Smith and Lewicki [19], which shows that the auditory code is optimum for natural sounds and especially suggests that the acoustic structure of speech could be adapted to the physiology of the peripheral auditory system
We have studied acoustic and auditory sketches, new classes of sounds based on sparse representations, which are severely impoverished versions of original sounds

Summary

Introduction

Human listeners can apparently recognize very and with no effort very diverse sound sources in their surrounding environment, the literature focusing on the recognition of natural sounds and on the features used by the listeners to recognize them is relatively scant (e.g. [1,2]). Taking carefully into account some low-level acoustic features, other models have been proposed, involving distributed neural representations in the entire human auditory cortex for both low-level features and abstract category encoding [13,14,15] They showed that a complex spectro-temporal pattern of features represents more accurately the auditory encoding of natural sounds than a purely spectral or temporal approach (see [16] for animal sounds only; [13,17]; see [18] for a computational and psychophysical approach). This result is coherent with the theoretical approach proposed by Smith and Lewicki [19], which shows that the auditory code is optimum for natural sounds and especially suggests that the acoustic structure of speech could be adapted to the physiology of the peripheral auditory system

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Mar 7, 2016
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Auditory Sketches: Very Sparse Representations of Sounds Are Still Recognizable.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Acoustic and auditory sketches: Recognition of severely simplified natural sounds by human listeners
Vincent Isnard ... Clara Suied
The Journal of the Acoustical Society of America | VOL. 138
Vincent Isnard, et. al.Vincent Isnard ... Clara Suied
01 Sep 2015
The Journal of the Acoustical Society of America | VOL. 138

Weighted Energy Reallocation Approach for Near-end Speech Enhancement
S.Steniffer Jebaruby ... M.P.Actlin Jeeva
-
S.Steniffer Jebaruby, et. al.S.Steniffer Jebaruby ... M.P.Actlin Jeeva
01 Mar 2019
01 Mar 2019

Sound recognition depends on real-world sound level
Sam V Norman-Haignere ... Josh H Mcdermott
The Journal of the Acoustical Society of America | VOL. 139
Sam V Norman-Haignere, et. al.Sam V Norman-Haignere ... Josh H Mcdermott
01 Apr 2016
The Journal of the Acoustical Society of America | VOL. 139

Auditory Spectrum-Based Pitched Instrument Onset Detection
...
IEEE Transactions on Audio, Speech, and Language Processing | VOL. -
, et. al. ...
01 Nov 2010
IEEE Transactions on Audio, Speech, and Language Processing | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Auditory Sketches: Very Sparse Representations of Sounds Are Still Recognizable.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one