Abstract
A number of problems in speech recognition arise through the treatment of speech as a linear temporal sequence. These include word onset detection, temporal normalisation and variations in pronunciation. It is suggested, following Wickelgren (1969, 1972) that speech should instead be represented by a non-sequential associative or context-sensitive code. ERIS, a computer speech recogniser based on a set of independent context-sensitive coded demons, demonstrates the validity and power of such an approach. Ways of incorporating absolute time information into a context-sensitive code are discussed, together with the possible need for and nature of intermediate levels of processing between the acoustic stimulus and a word or morpheme representation. Rather than postulating any such units in advance, it is suggested that by considering word recognition as an acoustic–lexical mapping, it will become apparent what intermediate levels are either necessary or useful. The power of even a relatively simple recognition system based on context sensitive coding and direct acoustic–lexical mapping suggests that these are important principles to be considered in any approach to understanding, modelling and simulating human speech perception.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.