Toward a Sustainable Handling of Interlinear-Glossed Text in Language Documentation

Johann-Mattis List,Robert Forkel,Nathaniel A Sims

doi:10.1145/3389010

Abstract

While the amount of digitally available data on the worlds’ languages is steadily increasing, with more and more languages being documented, only a small proportion of the language resources produced are sustainable. Data reuse is often difficult due to idiosyncratic formats and a negligence of standards that could help to increase the comparability of linguistic data. The sustainability problem is nicely reflected in the current practice of handling interlinear-glossed text, one of the crucial resources produced in language documentation. Although large collections of glossed texts have been produced so far, the current practice of data handling makes data reuse difficult. In order to address this problem, we propose a first framework for the computer-assisted, sustainable handling of interlinear-glossed text resources. Building on recent standardization proposals for word lists and structural datasets, combined with state-of-the-art methods for automated sequence comparison in historical linguistics, we show how our workflow can be used to lift a collection of interlinear-glossed Qiang texts (an endangered language spoken in Sichuan, China), and how the lifted data can assist linguists in their research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Toward a Sustainable Handling of Interlinear-Glossed Text in Language Documentation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 31, 2021
Citations: 1

Similar Papers

Automated Parsing of Interlinear Glossed Text from Page Images of Grammatical Descriptions
...
-
, et. al. ...
01 Jan 2020
01 Jan 2020

TypeCraft collaborative databasing and resource sharing for linguists
Dorothee Beermann ... Pavel Mihaylov
Language Resources and Evaluation | VOL. 48
Dorothee Beermann, et. al.Dorothee Beermann ... Pavel Mihaylov
15 Nov 2013
Language Resources and Evaluation | VOL. 48

Computational strategies for reducing annotation effort in language documentation
Alexis Palmer ... Telma Can
Linguistic Issues in Language Technology | VOL. 3
Alexis Palmer, et. al.Alexis Palmer ... Telma Can
01 Feb 2010
Linguistic Issues in Language Technology | VOL. 3

Language documentation and historical linguistics
Lyle Campbell
-
Lyle CampbellLyle Campbell
19 Apr 2016
19 Apr 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward a Sustainable Handling of Interlinear-Glossed Text in Language Documentation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing