Multi-Lingual Language Variety Identification using Conventional Deep Learning and Transfer Learning Approaches

Sameeah Noreen Hameed,Qiao Ya-Nan,Muhammad Adnan Ashraf

doi:10.34028/iajit/19/5/1

Abstract

Language variety identification tends to identify lexical and semantic variations in different varieties of a single language. Language variety identification helps build the linguistic profile of an author from written text which can be used for cyber forensics and marketing purposes. Investigating previous efforts for language variety identification, we hardly find any study that experiments with transfer learning approaches and/or performs a thorough comparison of different deep learning approaches on a range of benchmark datasets. So, to bridge this gap, we propose transfer learning approaches for language variety identification tasks and perform an extensive comparison of them with deep learning approaches on multiple varieties of four widely spoken languages, i.e., Arabic, English, Portuguese, and Spanish. This research has treated this task as a binary classification problem (Portuguese) and multi-class classification problem (Arabic, English, and Spanish). We applied two transfer learning Bidirectional Encoder Representations from Transformers (BERT), Universal Language Model Fine-tuning (ULMFiT), three deep learning-Convolutional Neural Networks (CNN), Bidirectional Long Short Term Memory (Bi-LSTM), Gated Recurrent Units (GRU), and an ensemble approach for identifying different varieties. A thorough comparison between the approaches suggests that the transfer learning based ULMFiT model outperforms all other approaches and produces the best accuracy results for binary and multi-class language variety identification tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Lingual Language Variety Identification using Conventional Deep Learning and Transfer Learning Approaches

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology

Lead the way for us

Similar Papers

BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports.
Daiki Nishigaki ... Junya Sato
Radiology: Artificial Intelligence | VOL. 5
Daiki Nishigaki, et. al.Daiki Nishigaki ... Junya Sato
15 Feb 2023
Radiology: Artificial Intelligence | VOL. 5

A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance
Hongxia Lu ... Cyril Rakovski
BMC Medical Research Methodology | VOL. 22
Hongxia Lu, et. al.Hongxia Lu ... Cyril Rakovski
02 Jul 2022
BMC Medical Research Methodology | VOL. 22

Comparative Analysis of Deep Learning Approaches for Twitter Text Classification
Lukesh Kadu
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 06
Lukesh KaduLukesh Kadu
21 Oct 2022
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 06

Short text automatic scoring system based on BERT-BiLSTM model
Linzhong Xia ... De’An Luo
Journal of Shenzhen University Science and Engineering | VOL. 39
Linzhong Xia, et. al.Linzhong Xia ... De’An Luo
01 May 2022
Journal of Shenzhen University Science and Engineering | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Lingual Language Variety Identification using Conventional Deep Learning and Transfer Learning Approaches

Abstract

Talk to us

Similar Papers

More From: The International Arab Journal of Information Technology