Abstract

Our acoustical environment abounds with repetitive sounds, some of which are related to pitch perception. It is still unknown how the auditory system, in processing these sounds, relates a physical stimulus and its percept. Since, in mammals, all auditory stimuli are conveyed into the nervous system through the auditory nerve (AN) fibers, a model should explain the perception of pitch as a function of this particular input. However, pitch perception is invariant to certain features of the physical stimulus. For example, a missing fundamental stimulus with resolved or unresolved harmonics, or a low and high-level amplitude stimulus with the same spectral content–these all give rise to the same percept of pitch. In contrast, the AN representations for these different stimuli are not invariant to these effects. In fact, due to saturation and non-linearity of both cochlear and inner hair cells responses, these differences are enhanced by the AN fibers. Thus there is a difficulty in explaining how pitch percept arises from the activity of the AN fibers. We introduce a novel approach for extracting pitch cues from the AN population activity for a given arbitrary stimulus. The method is based on a technique known as sparse coding (SC). It is the representation of pitch cues by a few spatiotemporal atoms (templates) from among a large set of possible ones (a dictionary). The amount of activity of each atom is represented by a non-zero coefficient, analogous to an active neuron. Such a technique has been successfully applied to other modalities, particularly vision. The model is composed of a cochlear model, an SC processing unit, and a harmonic sieve. We show that the model copes with different pitch phenomena: extracting resolved and non-resolved harmonics, missing fundamental pitches, stimuli with both high and low amplitudes, iterated rippled noises, and recorded musical instruments.

Highlights

  • The perception of pitch is an important feature of speech recognition and perception of musical melodies

  • To demonstrate the advantage of using sparse coding algorithms, we compared the performance of the algorithm (Eq 2) for sparse (λ = 0.01) and non-sparse solutions (λ = 0, Least squares)

  • We showed that a model based on the sparse coding of the spatiotemporal pattern of auditory nerve responses is consistent with many pitch perception phenomena

Read more

Summary

Introduction

The perception of pitch is an important feature of speech recognition and perception of musical melodies. The pitch class, or the pitch chroma, is the set of all pitches that are related by whole octave numbers and is known in musical theory as "octave equivalence"; the pitch height is the continuum perception of sound from low to high. A unique property of pitch perception is that it is a many-to-many mapping: a similar pitch can be perceived by different acoustic stimuli, and a given acoustic stimulus can yield different percepts of pitch. This property is the reason that makes pitch an interesting property of the mind, but it is the reason that makes it hard to explain. The question arises: How does a brain manage to perform this task?

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call