Abstract
Although a beating tone and the two pure tones which give rise to it are linearly dependent, the ear considers them to be independent as tone sensations. A linear time-frequency representation of acoustic data is unable to model these phenomena. A time-tone sensation approach is proposed for inclusion within audio analysis systems. The proposed approach extends linear time-frequency analysis of acoustic data, by accommodating the nonlinear phenomenon of beats. The method replaces the one-dimensional tonotopic axis of linear time-frequency analysis with a two-dimensional tonotopic plane, in which one direction corresponds to tone, and the other to its frequency of modulation. Some applications to audio prostheses are discussed. The proposed method relies on an intuitive criterion of optimal representation which can be applied to any overcomplete signal basis, allowing for many signal processing applications.
Highlights
Speech recognition is a hierarchical process consisting of four main phases, audio analysis, speech feature extraction, pattern classification, and language processing [1, 2]
E3 is not excluded from the signal basis, but included as it corresponds to an independent tone sensation
The only requirement on this choice is that it satisfies an intuitive criterion of optimality that signals which correspond to independent tone sensations are transformed into linearly independent signal descriptors
Summary
Speech recognition is a hierarchical process consisting of four main phases, audio analysis, speech feature extraction, pattern classification, and language processing [1, 2]. The overlapping linear filter model fails to account for essentially nonlinear phenomena of human audition, such as masking, beats, and the sensation of Tartini or combination tones [4, 5]. Nonlinear models of early stages of the human audition draw heavily on understanding of psychoacoustics and the neurophysiology of the audition These include the nonlinear mechanics of the cochlea [20], use of wave digital filtering for the analysis of Tartini tones [21], adaptive Qcircuits [22], and formant tracking based on temporal analysis of a nonlinear cochlear model [23]. It is hypothesized that such models may be useful for efficient audio analysis in the fields of psychoacoustics, speech analysis, and audio prostheses
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have