Abstract

We claim that speech analysis algorithms should be based on computational models of human audition, starting at the ears. While much is known about how hearing works, little of this knowledge has been applied in the speech analysis field. We propose models of the inner ear, or cochlea, which are expressed as time- and place-domain signal processing operations; i.e. the models are computational expressions of the important functions of the cochlea. The main parts of the models concern mechanical filtering effects and the mapping of mechanical vibrations into neural representation. Our model cleanly separates these effects into time-invariant linear filtering based on a simple cascade/parallel filterbank network of second-order sections, plus transduction and compression based on half-wave rectification with a nonlinear coupled automatic gain control network. Compared to other speech analysis techniques, this model does a much better job of preserving important detail in both time and frequency, which is important for robust sound analysis. We discuss the ways in which this model differs from more detailed cochlear models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.