Abstract

Training state-of-the-art Part-of-speech (POS) taggers traditionally requires many handcraft features and external data. In this paper, we propose a neural network architecture for POS tagging task for both contemporary and historical Portuguese texts. The proposed architecture does not use the two traditional requirements cited above. It uses word embeddings and character embeddings representations combined with a BLSTM layer. We apply the architecture on three Portuguese corpora and obtaining state-of-the-art accuracy of 97.87% on the Mac-Morpho corpus, 97.62% accuracy on the revised Mac-Morpho and 97.36% on Tycho Brahe. We also improve the tagging accuracy for Out of Vocabulary (OOV) words in the Mac-Morpho corpus and in the revised Mac-Morpho.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call