Word segmentation in handwritten Korean text lines based on gap clustering techniques

S.H Kim,Guee-Sang Lee Guee-Sang Lee,S Jeong,C.Y Suen

doi:10.1109/icdar.2001.953781

Abstract

We propose a word segmentation method for handwritten Korean text lines. It uses gap information to separate a text line into word units, where the gap is defined as a white-run obtained after a vertical projection of the line image. Each gap is classified into a between-word gap or a within-word gap using a clustering technique. We take up three gap metrics - the bounding box (BB), run-length/Euclidean (RLE) and convex hull (CH) distances - which are known to have superior performance in Roman-style word segmentation, and three clustering techniques - the average linkage method, the modified MAX method and sequential clustering. An experiment with 498 text-line images extracted from live mail pieces has shown that the best performance is obtained by the sequential clustering technique using all three gap metrics.

Full Text