Abstract

By utilising web-based collaboration tools, institutions can engage users in the building of historical printed text resources created by mass digitisation projects. The paper presents the drivers for developing such tools and identifies the benefits that can be derived by both the user community and cultural heritage institutions. The perceived risks, such as errors introduced by the users or whether users will engage with resources in this way, will be set out. The paper will present the lessons that can be learnt from existing activities, such as the National Library of Australia’s newspaper website, which supports collaborative correction of Optical Character Recognition (OCR) output. The user collaboration tools being created by the IMPACT Project (Improving Access to Text, http://www.impact-project.eu), a large-scale integrating project funded by the European Commission as part of the Seventh Framework Programme (FP7), will be detailed. A primary aim of IMPACT is to develop tools that help improve OCR results for historical printed texts, specifically those works published before the industrial production of books in the middle of the 19th century. While technological improvements to image processing and OCR engine technology are key to improving access to historic text, engaging the user community also has an important role to play. Utilising the user community can aid in achieving the levels of accuracy currently found in born digital materials. Improving OCR results to this level is key to producing resources that support better resource discovery and enable greater performance when applying text mining and accessibility tools to the extracted text. The IMPACT project will specifically develop a tool that supports collaborative correction and validation of OCR results and to allow user involvement in building historical dictionaries that can be used to validate word recognition. The technologies use the characteristics of human perception as a basis for error detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call