Abstract

As an important prerequisite step of historical document image analysis, character segmentation is fundamental but challenging. In this paper, we propose a novel approach for the handwritten character segmentation of historical documents by treating it as a sequence labeling problem. In more detail, the proposed model first segments document image into lines, then each column in the line image is given a label to indicate it is a segmentation position or not. The segmentation labeling is achieved by a neural model, which combines a CNN for feature extraction, a LSTM for sequence modeling and a CRF for sequence labeling. The performance of our methods has been evaluated on a 300-page dataset including 96,479 characters. The experimental results demonstrate that the proposed methods achieve superior or highly competitive performance compared with other methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call