Abstract
In this paper, a novel approach of connected spoken word recognition is proposed, based only on a relatively simple artificial neural network model. The model used is a modified version of the previously proposed cascaded neuro-computational model and has a three-layered network structure, where a non-linear metric to each of the second-layer units is newly introduced for performing effectively the pattern matching at the word-feature level. Simulations were conducted using connected speech data sets of a larger lexicon than those used in the previous works; the data sets were comprised of the naturally spoken strings, each string consisting of a varying number of 2–7 words selected from a total of 47 Japanese prefecture names. The simulation results show that the modified model yields the overall recognition performance, i.e., 95.2% in terms of the word accuracy rate, which is comparable to that (98.1%) obtained using a benchmark approach of hidden Markov model with embedded training.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have