Abstract

We propose for Kadiwéu, a polysynthetic language of Brazil, an extension of the POS annotation of the Tycho Brahe Annotated Corpus of Historical Portuguese (www.tycho.iel.unicamp.br/~tycho/corpus) – henceforth TBC, which consists in tagging both words and morphemes, yielding a two-level annotation. The tagging of words is necessary to generate the syntactic parsing that is missing from the current corpuses of Brazilian native languages. The morphological tagging is also crucial for polysynthetic languages since it allows searching for grammatical properties encoded by the morphemes. This is a pioneer proposal since it is the first time an American Indian language will be part of a Corpus allowing grammatical searches that include morphological and syntactic information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call