Portuguese POS Tagging Using BLSTM Without Handcrafted Features

Rômulo César Costa De Sousa,Hélio Lopes

doi:10.1007/978-3-030-33904-3_11

Portuguese POS Tagging Using BLSTM Without Handcrafted Features

Rômulo César Costa De Sousa, Hélio Lopes

https://doi.org/10.1007/978-3-030-33904-3_11

Copy DOI

Publication Date: Jan 1, 2019

Citations: 1

Affiliation: Pontifical Catholic University of Rio de Janeiro

#Out Of Vocabulary #Handcraft Features + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Training state-of-the-art Part-of-speech (POS) taggers traditionally requires many handcraft features and external data. In this paper, we propose a neural network architecture for POS tagging task for both contemporary and historical Portuguese texts. The proposed architecture does not use the two traditional requirements cited above. It uses word embeddings and character embeddings representations combined with a BLSTM layer. We apply the architecture on three Portuguese corpora and obtaining state-of-the-art accuracy of 97.87% on the Mac-Morpho corpus, 97.62% accuracy on the revised Mac-Morpho and 97.36% on Tycho Brahe. We also improve the tagging accuracy for Out of Vocabulary (OOV) words in the Mac-Morpho corpus and in the revised Mac-Morpho.

Full Text