A clustering method for graphical handwriting components and statistical writership analysis.

Amy M Crawford,Nicholas S Berry,Alicia L Carriquiry

doi:10.1002/sam.11488

Amy M Crawford, Nicholas S Berry + Show 1 more

Open Access

https://doi.org/10.1002/sam.11488

Copy DOI

Journal: Statistical analysis and data mining	Publication Date: Nov 24, 2020
Citations: 7	License type: CC BY 4.0

Affiliation: Iowa State University

Abstract

Handwritten documents can be characterized by their content or by the shape of the written characters. We focus on the problem of comparing a person's handwriting to a document of unknown provenance using the shape of the writing, as is done in forensic applications. To do so, we first propose a method for processing scanned handwritten documents to decompose the writing into small graphical structures, often corresponding to letters. We then introduce a measure of distance between two such structures that is inspired by the graph edit distance, and a measure of center for a collection of the graphs. These measurements are the basis for an outlier tolerant K‐means algorithm to cluster the graphs based on structural attributes, thus creating a template for sorting new documents. Finally, we present a Bayesian hierarchical model to capture the propensity of a writer for producing graphs that are assigned to certain clusters. We illustrate the methods using documents from the Computer Vision Lab dataset. We show results of the identification task under the cluster assignments and compare to the same modeling, but with a less flexible grouping method that is not tolerant of incidental strokes or outliers.

Highlights

Many disciplines rely on the ability to parse, process, and analyze handwritten text
The groupings that result from the dynamical clustering method we propose are more parsimonious, descriptive, and repeatable for writers than deterministic groupings, because of their robustness to small structural differences among graphs
It is worth mentioning that we only use handwriter for document processing in this work, but the software has other feature extraction capabilities such as finding centroids, slants, loops, and other measurable attributes for each graph. These features are undeniably important for forensic handwriting analysis, but are not used to create the clustering template that is of focus here, and we do not discuss them further

Summary

INTRODUCTION

Many disciplines rely on the ability to parse, process, and analyze handwritten text. We use scanned handwritten documents from a variety of writers in the Computer Vision Lab (CVL) database [8] to meet our goals In this writership analysis framework, there are two stages of feature extraction that occur. A question of interest is whether writers can be distinguished by the proportion of the graphs extracted from their writing that fall into each of the k clusters in the template To address this question, we use the observed cluster frequencies in a document by a writer as the response variable in a hierarchical model to estimate the posterior probability of writership for each writer in a closed set. The groupings that result from the dynamical clustering method we propose are more parsimonious, descriptive, and repeatable for writers than deterministic groupings, because of their robustness to small structural differences among graphs.

SEGMENTING A DOCUMENT INTO GRAPHS

Preprocessing of a handwritten document

Segmenting connected ink into graphs

Adjacency grouping

CLUSTERING ALGORITHM FOR GRAPHS

Distance measure for graphs

Distance between two edges

Graph distance measure from edge calculations

Weighted mean of graphs

K-means-type algorithm

Update To with

APPLICATION

Creating a clustering template

Writer identification

Model formulation

Prediction

Findings

DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A clustering method for graphical handwriting components and statistical writership analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistical analysis and data mining

Lead the way for us

Similar Papers

TaGSim
Jiyang Bai ... Peixiang Zhao
Proceedings of the VLDB Endowment | VOL. 15
Jiyang Bai, et. al.Jiyang Bai ... Peixiang Zhao
01 Oct 2021
Proceedings of the VLDB Endowment | VOL. 15

Anytime and Distributed Approaches for Graph Matching
Zeina Abu-Aisheh
ELCVIA Electronic Letters on Computer Vision and Image Analysis | VOL. 15
Zeina Abu-AishehZeina Abu-Aisheh
04 Nov 2016
ELCVIA Electronic Letters on Computer Vision and Image Analysis | VOL. 15

An Improved Method of Graph Edit Distance for Business Process Model Similarity Measurement
Indra Waspada ... Riyanarto Sarno
-
Indra Waspada, et. al.Indra Waspada ... Riyanarto Sarno
10 Nov 2020
10 Nov 2020

A Comparative Study of Three Graph Edit Distance Algorithms
Xinbo Gao ... Bing Xiao
-
Xinbo Gao, et. al.Xinbo Gao ... Bing Xiao
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A clustering method for graphical handwriting components and statistical writership analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistical analysis and data mining