Abstract

Greek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the digital texts available in TEI EpiDoc XML format to a format which can be morphologically and syntactically annotated (treebanked), and where the user can add new metadata concerning the text type, writer and handwriting of each act of writing. An important aspect in this process is to take into account the original surviving writing vs. the standardization of language and supplements made by the editors. This is performed by creating two different layers of the same text. The platform is in its early development phase. Ongoing and future developments, such as tagging linguistic variation phenomena as well as queries performed within Sematia, are discussed at the end of the article.

Highlights

  • Greek papyri from Egypt have preserved bigger and smaller entities of Greek as it was written by ancient speakers from ca. 300 BCE to 700 CE

  • The documentary papyrological corpus is freely available in digital form in the [Papyrological Navigator] (PN) platform, which allows users to search both text strings and metadata

  • In this article, we have described a process in which individual texts from the corpus of documentary Greek papyri can be preprocessed for the purposes of linguistic annotation

Read more

Summary

INTRODUCTION

Greek papyri from Egypt have preserved bigger and smaller entities of Greek as it was written by ancient speakers from ca. 300 BCE to 700 CE. The search possibilities do not, yield to querying linguistic structures or variation in spelling or morphosyntax For this reason, the papyrological corpus has been left without much attention within the majority of linguistic research of Ancient Greek. A research project of author 1 (“SEMATIA: Linguistic Annotation of the Greek Documentary Papyri – Detecting and Determining Contact-Induced, Dialectal and Stylistic Variation” funded by the Academy of Finland) sought methods to make better use of the papyri for purposes of linguistic research. In this first phase we needed a way to preprocess the papyri into a form which could be linguistically annotated. Sometimes a text is a product of one writer only, but Journal of Data Mining and Digital Humanities ISSN 2416-­‐5999, an open-­‐access journal http://jdmdh.episciences.org in many cases two or more different people have written in one document, attested by the change of handwriting

BACKGROUND
PREPROCESSING THE PAPYRI
Technical realisation
METADATA
ONGOING AND FUTURE DEVELOPMENTS
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call