Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

Christian Chiarcos,Jinyan Wang,Christian Fäth,William Mcgrath,Julius Steuer,Émilie Pagé-Perron,Jayanth Jayanth,Niko Schenk,Ilya Khait

doi:10.3390/info9110290

Abstract

This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian.

Highlights

The Sumerian language, an agglutinative isolate, is the earliest known language recorded in writing
We adopt a Linked Open Data approach for this purpose: We provide and consult an OWL representation of the Cuneiform Digital Library Initiative (CDLI) annotation scheme and its linking with Universal Dependencies (UD) POS, feature and dependency labels as part of the Ontologiexs of Linguistic
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general

Summary

Introduction

The Sumerian language, an agglutinative isolate, is the earliest known language recorded in writing. It was spoken in the third millennium BC in southern Iraq, and continued to be written until the late first millennium BC. Assyriologists make a text available for research by first copying and transcribing it from the inscribed artifact. A dozen projects which make various cuneiform corpora available on-line have emerged, building on digital transcriptions created as early as the 1960s. These initiatives rarely use shared conventions, and the tool-set available. We employ Linguistic Linked Open Data (LLOD) technology to improve interoperability and resource integration for machine translation and linguistic annotation of Sumerian

Linked Open Data for Sumerian

The MTAAC Project

CoNLL Format

CoNLL-RDF

Annotation Workflow

Annotating Morphology

Dictionary-Based Pre-Annotation

Rule-Based Pre-Annotation with SPARQL

Application and Evaluation

Annotating Syntax

RDF-Based Pre-Annotation

Limits of Syntactic Pre-Annotation

Annotating Semantics

Machine Translation

Findings

Summary

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Nov 19, 2018
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
A Pareja-Lora
-
A Pareja-LoraA Pareja-Lora
09 Nov 2012
09 Nov 2012

Disambiguating descriptions: Mapping digital special collections metadata into linked open data formats
...
Proceedings of the Association for Information Science and Technology | VOL. 53
, et. al. ...
01 Jan 2015
Proceedings of the Association for Information Science and Technology | VOL. 53

Linguistic Linked Open Data Cloud
John P Mccrae ... Jorge Gracia
-
John P Mccrae, et. al.John P Mccrae ... Jorge Gracia
01 Jan 2020
01 Jan 2020

Zhishi.lemon: On Publishing Zhishi.me as Linguistic Linked Open Data
Julia Bosque-Gil ... Jorge Gracia
-
Julia Bosque-Gil, et. al.Julia Bosque-Gil ... Jorge Gracia
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information