Unified benchmark for zero-shot Turkish text classification

Emrecan Çelik,Tuğba Dalyan

doi:10.1016/j.ipm.2023.103298

Abstract

Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zero-shot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unified benchmark for zero-shot Turkish text classification

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management

Lead the way for us

Journal: Information Processing and Management	Publication Date: Feb 9, 2023
Citations: 5

Similar Papers

Improving sentence representation for vietnamese natural language understanding using optimal transport
Phu Xuan-Vinh Nguyen ... Kiet Van Nguyen
Journal of Intelligent & Fuzzy Systems | VOL. -
Phu Xuan-Vinh Nguyen, et. al.Phu Xuan-Vinh Nguyen ... Kiet Van Nguyen
27 Jun 2023
Journal of Intelligent & Fuzzy Systems | VOL. -

GLUECoS: An Evaluation Benchmark for Code-Switched NLP
Simran Khanuja ... Monojit Choudhury
-
Simran Khanuja, et. al.Simran Khanuja ... Monojit Choudhury
01 Jan 2020
01 Jan 2020

Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? {A} Comprehensive Assessment for {C}atalan
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

A Toxic Comment Classification Model Based on Ensemble
Jian Xu ... Yuqing Zhai
Journal of Physics: Conference Series | VOL. 1873
Jian Xu, et. al.Jian Xu ... Yuqing Zhai
01 Apr 2021
Journal of Physics: Conference Series | VOL. 1873

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unified benchmark for zero-shot Turkish text classification

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management