Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Parul Chopra,Alan W Black,Sai Krishna Rallabandi,Khyathi Raghavi Chandu

doi:10.18653/v1/2021.findings-emnlp.373

Abstract

Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is the intra-sentence mixing of languages leading to switch points. We first benchmark two sequence labeling tasks -- POS and NER on 4 different language pairs with a suite of pretrained models to identify the problems and select the best performing model, char-BERT, among them (addressing (1)). We then propose a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data (addressing (2)). We finally demonstrate that our approach performs well on both tasks by reducing the gap between the switch point performance while retaining the overall performance on two distinct language pairs in both the tasks. Our code is available here: https://github.com/PC09/EMNLP2021-Switch-Point-biased-Self-Training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

Similar Papers

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching
...
-
, et. al. ...
23 Oct 2021
23 Oct 2021

CharBERT: Character-aware Pre-trained Language Model
Wentao Ma ... Ting Liu
-
Wentao Ma, et. al.Wentao Ma ... Ting Liu
01 Jan 2020
01 Jan 2020

Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets
Seid Muhie Yimam ... Gopalakrishnan Venkatesh
Future Internet | VOL. 13
Seid Muhie Yimam, et. al.Seid Muhie Yimam ... Gopalakrishnan Venkatesh
27 Oct 2021
Future Internet | VOL. 13

Accelerating BERT Inference for Sequence Labeling via Early-Exit

-

01 Aug 2021
01 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Abstract

Talk to us

Similar Papers