The first annotated corpus of historical Basque

Ainara Estarrona,Manuel Padilla-Moyano,Ricardo Etxepare,Ander Soraluze,Izaskun Etxeberria

doi:10.1093/llc/fqab066

Abstract

Abstract This article presents the elaboration of a morphosyntactically annotated diachronic corpus of Basque, and the first results obtained in the processing of historical varieties of this language with computational techniques. The corpus size is around one million words, expanding from the 15th to the mid-18th century and encompassing the most significant written production in all historical dialects. Morphosyntactic tagging allows for systematic searches at different levels of complexity; additionally, a rich set of metadata enables searches based on sociohistorical criteria too. This is not only the first tagged corpus of historical Basque but also a means to improve language processing tools by analyzing historical varieties more or less distant from the present-day standard language. Moreover, this project aims to set a model for further works in the historical corpora of Basque and inform similar projects on other languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The first annotated corpus of historical Basque

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities

Lead the way for us

Similar Papers

Linguistic Resources Construction: Towards Disfluency Processing in Spontaneous Tunisian Dialect Speech
Emna Boughariou ... Lamia Hadrich Bleguith
-
Emna Boughariou, et. al.Emna Boughariou ... Lamia Hadrich Bleguith
01 Jan 2019
01 Jan 2019

Myanmar POS Resource Extension Effects on Automatic Tagging Methods
Zar Zar Hlaing ... Thepchai Supnithi
-
Zar Zar Hlaing, et. al.Zar Zar Hlaing ... Thepchai Supnithi
18 Nov 2020
18 Nov 2020

MACHINE LEARNING OF MORPHOSYNTACTIC STRUCTURE: LEMMATIZING UNKNOWN SLOVENE WORDS
Tomaž Erjavec ... Sasčo Džeroski
Applied Artificial Intelligence | VOL. 18
Tomaž Erjavec, et. al.Tomaž Erjavec ... Sasčo Džeroski
01 Jan 2004
Applied Artificial Intelligence | VOL. 18

On the Use of Morpho-Syntactic Description Tags in Neural Machine Translation with Small and Large Training Corpora
Gregor Donaj ... Mirjam Sepesy Maučec
Mathematics | VOL. 10
Gregor Donaj, et. al.Gregor Donaj ... Mirjam Sepesy Maučec
09 May 2022
Mathematics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The first annotated corpus of historical Basque

Abstract

Talk to us

Similar Papers

More From: Digital Scholarship in the Humanities