Abstract

Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog (https://glottolog.org). In this paper, we summarize the motivation and history behind the system of glottocodes and describe the principles and practices of data curation, technical infrastructure and update/version-tracking systematics. Since our understanding of the target domain – the dialects, languages and language families of the entire world – is continually evolving, changes and updates are relatively common. The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications.

Highlights

  • CO Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog

  • The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, T Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications

  • A glottocode consists of four alphanumeric characters and four decimal digits, for example abcd1234 or b10b1234

Read more

Summary

Introduction

CO Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog (https://glottolog.org, currently in edition 4.4, [14]). A glottocode consists of four alphanumeric characters (i.e., lowercase letters or decimal digits) and four decimal digits, for example abcd1234 or b10b1234. Glottocodes are complementary to three-letter ISO 639-3 language identification codes (see https://iso639-3.sil.org/) which, concern languages only. There are 25,900 glottocodes (8,533 language-level, 4,571 family-level and 12,796 dialectlevel). Hammarström / Glottocodes: Identifiers linking families, languages and dialects

Motivation and history
Glottolog data is interoperable
Glottolog data is reusable
Policies governing glottocode assignment
Glottolog versioning
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.