Abstract

Learner corpora are gaining popularity in the Baltic States as well as elsewhere in the world. The aim of the article is to discuss what kinds of annotation have been used in learner corpus research in Latvia and Lithuania so far and to describe which ones of them would be most suitable for the newly created learner corpus of the second Baltic language – Esam . A lot of learner corpus research in Latvia and Lithuania is undertaken without any annotation. The most common types of annotation are the ones based on the theory of levels of language – morphological and syntactic annotation. There is little collaboration between researchers of neighbour countries, but linguists of each country collaborate closely with each other using similar annotation schemes and creating corpora that are comparable in some aspects. The learner corpus of the second Baltic language should try to fit in the picture to some extent. Part of speech annotation and simple syntactic annotation could help in that. However, things that have not yet become so popular in learner corpus research in this region could also be useful. Therefore, error annotation and lemmatization have been chosen to be included in the annotation plan of the corpus Esam as well. DOI: http://dx.doi.org/10.7220/2335-2027.7.8

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.