Abstract

Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to character segmentation for unconstrained handwritten text. By taking advantage of the plethora in unlabeled data found in image databases in addition to some available labeled examples, we overcome the expensive task of annotating the whole set of training data and the performance of the character segmentation learner is increased. Apart from this approach, which has not previously used for this task, we have experimented with two well-known machine learning methods (Learning Vector Quantization and a simplified version of the Transformation-Based Learning theory). We argue that a classifier generated from BBN and SVM is well suited for learning to identify the correct segment boundaries. Empirical results will support this claim. Performance has been methodically evaluated using both English and Modem Greek corpora in order to determine the unbiased behaviour of the trained models. Limited training data are proved to endow with satisfactory results. We have been able to achieve precision exceeding 86%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call