In this paper, a new method, called the time-slicing paradigm, for the recognition of temporal patterns using neural networks is presented. This is a method for the analysis of the speech signal with the aim to achieve the recognition of connected speech with less preprocessing of the input signal than other existing neural networks. Along with the time-slicing paradigm, this work also introduces the concept of “natural” connectionist glue. Using the time-sliced method and the natural connectionist glue, the time-sliced recurrent cascade correlation network (TS-RCCN) was trained to recognize Japanese phonemes. This network uses a parallel-modular version of Fahlman's recurrent cascade-correlation learning architecture (RCC). The input to the network does not have to be labeled or segmented phoneme by phoneme; instead, it is divided into small sequential time-slices that do not overlap in time. The network processes one slice after other regardless of the total length of the input signal, and produces an immediate recognition hypothesis for each slice of the processed phoneme sequence. The training is done using small portions of every phoneme taken from a subset of a list of recorded words. The testing of the network, however, is done using the whole words without prealignment or segmentation. The results show a great promise for real-time recognition of connected speech, where the strength of the TS-RCCN method is that its computational cost is lower than other neural netbased methods, low enough to run even on a PC. These advantages make this method viable for spontaneous continuous speech recognition. Furthermore, its application is not limited to speech only, but can also be used for other kinds of temporal patterns.
Read full abstract