Multi-Source Cross-Lingual Model Transfer: Learning What to Share

Xilun Chen,Hany Hassan,Claire Cardie,Ahmed Hassan Awadallah,Wei Wang

doi:10.18653/v1/p19-1299

Abstract

Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language-invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging tasks including a large-scale industry dataset.

Highlights

Recent advances in deep learning enabled a wide variety of NLP models to achieve impressive performance, thanks in part to the availability of large-scale annotated datasets
We evaluate our model on multiple multilingual transfer learning (MLTL) tasks ranging from text classification to named entity recognition and semantic slot filling, including a real-world industry dataset
We propose multinomial adversarial network (MAN)-MoE, a multilingual model transfer approach that exploits both language-invariant features and language-specific features, which departs from most previous models that can only make use of shared features

Summary

Introduction

Recent advances in deep learning enabled a wide variety of NLP models to achieve impressive performance, thanks in part to the availability of large-scale annotated datasets. Such an advantage is not available to most of the world languages since many of them lack the the labeled data necessary for training deep neural nets for a variety of NLP tasks. Most research on CLTL has been devoted to the standard bilingual transfer (BLTL) case where training data comes from a single source language. In this work, we focus on the multi-source CLTL scenario, known as multilingual transfer learning (MLTL), to further boost the target language performance

Methods

Results

Conclusion