Abstract

Offline writer identification plays an important role in forensic document examination and historical document analysis. Today, challenges still exist in historical writer identification (WI), where documents may present very complex handwriting styles. In this paper, we propose novel techniques for a detailed description and accurate identification of handwriting in historical documents. Because handwriting contours are one of the most salient components to characterize one’s handwriting style, a novel pathlet feature is proposed to describe their rich properties beyond slant and curvature in a principled way; these properties can be exploited in a VLAD-like encoding framework for fine-grained handwriting description. Besides the pathlet feature, we extract unidirectional SIFT feature to describe handwriting corners and junctions. To effectively encode the pathlet and SIFT features, a novel encoding method, named bagged VLAD, is further proposed to address the problem that a large codebook sparsely spreads out the data points and leads to a degraded performance, allowing a much larger codebook for improved encoding performance. Our proposed method achieves state-of-the-art performance on ICDAR2017 Historical-WI database and ICDAR2019 HDRC-IR database, and has won the first place in ICDAR2019 HDRC-IR competition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call