In cognitive science research on natural language processing, motor learning and visual perception, perceiving boundary points and segmenting a continuous string or sequence is one of the fundamental problems. Boundary perception can also be viewed as a machine learning problem; supervised or unsupervised learning. In supervised learning approach for determining boundary points for segmentation of a sequence, it is necessary to have some pre-segmented training examples. In unsupervised mode, the learning is accomplished without any training data hence, the frequency of occurence of symbols within the sequence is normally used as the cue. Most of earlier algorithms use this cue while scanning the sequence in forward direction. In this paper we propose a novel approach of extracting the possible boundary points by using bi-directional scanning of the sequence. We show here that such an extension from unidirectional to bi-directional is not trivial and requires judicious consideration of datastructure and algorithm. We here propose a new algorithm which traverses the sequence unidirectionally but extracts the information bi-directionally. Our method yields better segmentation which is demonstrated by rigorous experimentation on several datasets.
Read full abstract