Abstract

This article discusses the technology of handwritten text recognition (HTR) as a tool for the analysis of historical handwritten documents. We give a broad overview of this field of research, but the focus is on the use of a method called ‘word spotting’ for finding words directly and automatically in scanned images of manuscript pages. We illustrate and evaluate this method by applying it to a medieval manuscript. Word spotting uses digital image analysis to represent stretches of writing as sequences of numerical features. These are intended to capture the linguistically significant aspects of the visual shape of the writing. Two potential words can then be compared mathematically and their degree of similarity assigned a value. Our version of this method gives a false positive rate of about 30%, when the true positive rate is close to 100%, for an application where we search for very frequent short words in a 16th-Century Old Swedish cursiva recentior manuscript. Word spotting would be of use e.g. to researchers who want to explore the content of manuscripts when editions or other transcriptions are unavailable.

Highlights

  • true positive rate (TPR) is the proportion of hits among actual instances of the target, e.g. the number of ‘och’ instances found divided by the actual number of instances

  • false positive rate (FPR) is the proportion of hits among actual non-instances of the target type, e.g. the number of incorrect instances proposed divided by the actual number of non-instances

  • A perfect procedure would produce TPR 100% and FPR 0%, whereas one giving random verdicts would perform in a way that makes the two measures identical, i.e. an example and a non-example would be as likely to be produced as search hits

Read more

Summary

Spotting Words in Medieval Manuscripts

Fredrik Wahlberga, Mats Dahllöfa, Lasse Mårtenssonb & Anders Bruna a Uppsala University, Sweden b University of Gävle, Sweden Published online: 20 Jan 2014. To cite this article: Fredrik Wahlberg, Mats Dahllöf, Lasse Mårtensson & Anders Brun (2014) Spotting Words in Medieval Manuscripts, Studia Neophilologica, 86:sup1, 171-186, DOI: 10.1080/00393274.2013.871975

PLEASE SCROLL DOWN FOR ARTICLE
Purpose and Introduction
Physical and digitized linguistic data
Automatic analysis of handwritten documents
HTR as a component in a system for digital palaeography
Old Swedish manuscripts
The image matching method
XN ðxi
The performance of the word spotting method
Prospects for future work in HTR
Printed Sources

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.