Abstract

Recognizing speech requires the identification and anlysis of temporally distributed cues. A system must either examine a sufficiently long window at a single glance or else internally accumulate stimulus information. Sequential networks follow the second path by storing information internally in state nodes. Feed-forward networks do not maintain their history internally and thus require that the speech signal be presented in fixed windows. The performances of sequential and feed-forward networks at recognition of auditorily preprocessed stop-vowel syllables are compared. Several feed-forward networks were trained by presenting a whole syllable to the network as a single token and requiring categorization. The sequential networks were more robust within and across speakers than the feed-forward networks. Unfortunately, the use of the back-propagation algorithm to train a sequential network requires presentation of a desired output at every time slice. This forces arbitrary choices for specifying target outputs. The results suggest that techniques based on back-propagation will prove inadequate to train networks to perform speech categorization across variations in speaker. This difficulty could be obviated by employment of a learning paradigm that does not require immediate feedback, such as a stochastic learning algorithm. [Work supported by NSF.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call