Abstract

AbstractMost combinations of NLP tasks and language varieties lack in-domain examples for supervised training because of the paucity of annotated data. How can neural models make sample-efficient generalizations from task–language combinations with available data to low-resource ones? In this work, we propose a Bayesian generative model for the space of neural parameters. We assume that this space can be factorized into latent variables for each language and each task. We infer the posteriors over such latent variables based on data from seen task–language combinations through variational inference. This enables zero-shot classification on unseen combinations at prediction time. For instance, given training data for named entity recognition (NER) in Vietnamese and for part-of-speech (POS) tagging in Wolof, our model can perform accurate predictions for NER in Wolof. In particular, we experiment with a typologically diverse sample of 33 languages from 4 continents and 11 families, and show that our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods. Our code is available at github.com/cambridgeltl/parameter-factorization.

Highlights

  • The annotation efforts in NLP have achieved impressive feats, such as the Universal Dependencies (UD) project (Nivre et al, 2019), which includes 83 languages

  • PF-d and PF-lr gain 4.49 / 4.20 in accuracy (∼7%) for POS tagging and 7.29 / 7.73 in F1 score (∼10%) for named entity recognition (NER) on average compared to transfer from the largest source (LS), the strongest baseline for single-source transfer

  • More details about the individual results on each task–language pair are provided in Figure 2, which includes the mean of the results over 3 separate runs

Read more

Summary

Introduction

The annotation efforts in NLP have achieved impressive feats, such as the Universal Dependencies (UD) project (Nivre et al, 2019), which includes 83 languages. Labeled data, which is both costly and labor-intensive, is missing for many of such task–language combinations. This shortage hinders the development of computational models for the majority of the world’s languages (Snyder and Barzilay, 2010; Ponti et al, 2019a). Zero-shot transfer across languages implies a change in the data domain, and leverages information from resource-rich languages to tackle the same task in a previously unseen target language (Lin et al, 2019; Rijhwani et al, 2019; Artetxe and Schwenk, 2019; Ponti et al, 2019a, inter alia). Zero-shot transfer across tasks within the same language (Ruder et al, 2019a), on the other hand, implies a change in the space of labels

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.