Abstract
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.
Highlights
The Sumerian language, an agglutinative isolate, is the earliest known language recorded in writing
We adopt a Linked Open Data approach for this purpose: We provide and consult an OWL representation of the Cuneiform Digital Library Initiative (CDLI) annotation scheme and its linking with Universal Dependencies (UD) POS, feature and dependency labels as part of the Ontologiexs of Linguistic
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general
Summary
The Sumerian language, an agglutinative isolate, is the earliest known language recorded in writing. It was spoken in the third millennium BC in southern Iraq, and continued to be written until the late first millennium BC. Assyriologists make a text available for research by first copying and transcribing it from the inscribed artifact. A dozen projects which make various cuneiform corpora available on-line have emerged, building on digital transcriptions created as early as the 1960s. These initiatives rarely use shared conventions, and the tool-set available. We employ Linguistic Linked Open Data (LLOD) technology to improve interoperability and resource integration for machine translation and linguistic annotation of Sumerian
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have