Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources

Joo-Kyung Kim,Ruhi Sarikaya,Young-Bum Kim,Eric Fosler-Lussier

doi:10.18653/v1/d17-1302

Abstract

Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.

Highlights

IntroductionBidirectional Long Short-Term Memory (BLSTM) based models (Graves and Schmidhuber, 2005), along with word embeddings and character embeddings, have shown competitive performance on Part-of-Speech (POS) tagging given sufficient amount of training examples (Ling et al, 2015; Lample et al, 2016; Plank et al, 2016; Yang et al, 2017)
Bidirectional Long Short-Term Memory (BLSTM) based models (Graves and Schmidhuber, 2005), along with word embeddings and character embeddings, have shown competitive performance on Part-of-Speech (POS) tagging given sufficient amount of training examples (Ling et al, 2015; Lample et al, 2016; Plank et al, 2016; Yang et al, 2017).Given insufficient training examples, we can improve the POS tagging performance by crosslingual POS tagging, which exploits affluent POS tagging corpora from other source languages
We introduce a cross-lingual transfer learning model for POS tagging requiring no cross-lingual resources, where knowledge transfer is made in the BLSTM layers on top of word embeddings and character embeddings

Summary

Introduction

Bidirectional Long Short-Term Memory (BLSTM) based models (Graves and Schmidhuber, 2005), along with word embeddings and character embeddings, have shown competitive performance on Part-of-Speech (POS) tagging given sufficient amount of training examples (Ling et al, 2015; Lample et al, 2016; Plank et al, 2016; Yang et al, 2017). Given an input word sequence, a BLSTM is used for the character sequence of each word, where the outputs of the ends of the character sequences from the forward LSTM and the backward LSTM are concatenated to the word vector of the current word to supplement the word representation These serve as an input to a BLSTM, and an output layer are used for POS tag prediction. The outputs of the common BLSTM and the private BLSTM of the current language are summed to be used as the input to the softmax layer to predict the POS tags of given word sequences. If the current sentence is “I am happy”, the forward LSTM predicts “am happy ” and the backward LSTM predicts “ I am” This objective encourages the BLSTM layers and the embedding layers to learn linguistically general-purpose representations, which are useful for specific downstream tasks (Rei, 2017). Instead, it is set as the size of the target train set divided by the size of the source train set

Experiments

Findings

Conclusion