Optical character recognition of typeset Coptic text with neural networks

So Miyagawa,Heike Behlmer,Kirill Bulert,Marco Büchler

doi:10.1093/llc/fqz023

Abstract

Abstract Digital Humanities (DH) within Coptic Studies, an emerging field of development, will be much aided by the digitization of large quantities of typeset Coptic texts. Until recently, the only Optical Character Recognition (OCR) analysis of printed Coptic texts had been executed by Moheb S. Mekhaiel, who used the Tesseract program to create a text model for liturgical books in the Bohairic dialect of Coptic. However, this model is not suitable for the many scholarly editions of texts in the Sahidic dialect of Coptic which use noticeably different fonts. In the current study, DH and Coptological projects based in Göttingen, Germany, collaborated to develop a new Coptic OCR pipeline suitable for use with all Coptic dialects. The objective of the study was to generate a model which can facilitate digital Coptic Studies and produce Coptic corpora from existing printed texts. First, we compared the two available OCR programs that can recognize Coptic: Tesseract and Ocropy. The results indicated that the neural network model, i.e. Ocropy, performed better at recognizing the letters with supralinear strokes that characterize the published Sahidic texts. After training Ocropy for Coptic using artificial neural networks, the team achieved an accuracy rate of &gt;91% for the OCR analysis of Coptic typeset. We subsequently compared the efficiency of Ocropy to that of manual transcribing and concluded that the use of Ocropy to extract Coptic from digital images of printed texts is highly beneficial to Coptic DH.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Digital Scholarship in the Humanities	Publication Date: Apr 22, 2019
Citations: 1	License type: cc-by-sa

R Discovery Prime

R Discovery Prime

Optical character recognition of typeset Coptic text with neural networks

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities

Lead the way for us

Similar Papers

Chapter 4 - Concepts, procedures, and applications of artificial neural network models in streamflow forecasting
Arash Malekian ... Nastaran Chitsaz
Advances in Streamflow Forecasting | VOL. -
Arash Malekian, et. al.Arash Malekian ... Nastaran Chitsaz
01 Jan 2020
Advances in Streamflow Forecasting | VOL. -

Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model.
Mahmoud Mahmoudi ... Alireza Mesdaghinia
Archives of Iranian medicine | VOL. 17
Mahmoud Mahmoudi, et. al.Mahmoud Mahmoudi ... Alireza Mesdaghinia
01 Dec 2014
Archives of Iranian medicine | VOL. 17

Engineering Applications of Bio-Inspired Artificial Neural Networks
Juan V Sánchez-Andrés
-
Juan V Sánchez-AndrésJuan V Sánchez-Andrés
01 Jan 1998
01 Jan 1998

Ore grade estimation of a limestone deposit in India using an Artificial Neural Network
...
Applied Gis | VOL. 2
, et. al. ...
14 Oct 2016
Applied Gis | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optical character recognition of typeset Coptic text with neural networks

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities