Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Mozhi Zhang,Yoshinari Fujinuma,Jordan Boyd-Graber

doi:10.1609/aaai.v34i05.6500

Abstract

Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 16

Similar Papers

Research of Text Classification Based on Improved TF-IDF Algorithm
Cai-Zhi Liu ... Zhi-Qiang Wei
-
Cai-Zhi Liu, et. al.Cai-Zhi Liu ... Zhi-Qiang Wei
01 Aug 2018
01 Aug 2018

Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages
Wietse De Vries ... Malvina Nissim
-
Wietse De Vries, et. al.Wietse De Vries ... Malvina Nissim
01 Jan 2021
01 Jan 2021

Detecting Offensive Language on Malay Social Media: A Zero-Shot, Cross-Language Transfer Approach Using Dual-Branch mBERT
Xingyi Guo ... Muhammad Zaiamri Zainal Abidin
Applied sciences | VOL. 14
Xingyi Guo, et. al.Xingyi Guo ... Muhammad Zaiamri Zainal Abidin
02 Jul 2024
Applied sciences | VOL. 14

Using Related Languages to Enhance Statistical Language Models
Anna Currey ... Jon Dehdari
-
Anna Currey, et. al.Anna Currey ... Jon Dehdari
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence