Abstract

This paper describes a system for processing sonorant regions of speech, motivated by knowledge of the human auditory system. The spectral representation is intended to reflect a proposed model for human auditory processing of speech, which takes advantage of synchrony in the nerve firing patterns to enhance formant peaks. The auditory model is also applied to pitch extraction, and thus a temporal pitch processor is envisioned. The spectrum is derived from the outputs of a set of linear filters with critical bandwidths. Saturation and adaptation are incorporated for each filter independently. Each “spectral” coefficient is determined by weighting the amplitude response at that frequency (corresponding to mean firing rate) by a measure of synchrony to the center frequency of the filter. Pitch is derived from a waveform generated by adding the (weighted) rectified filter outputs across the frequency dimension. The system performance is evaluated by processing of a variety of signals, including natural and synthetic speech, and results are compared with other processing methods and with known psychoacoustical data from these types of stimuli. [Work supported in part by NINCDS and the System Development Foundation.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call