TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

Nursyahirah Tarmizi,Dayang Hanani Abang Ibrahim,Suhaila Saee

doi:10.11113/aej.v13.19171

Nursyahirah Tarmizi, Dayang Hanani Abang Ibrahim + Show 1 more

Open Access

https://doi.org/10.11113/aej.v13.19171

Copy DOI

Abstract

Online Social Network (OSN) is frequently used to carry out cyber-criminal actions such as cyberbullying. As a developing country in Asia that keeps abreast of ICT advancement, Malaysia is no exception when it comes to cyberbullying. Author Identification (AI) task plays a vital role in social media forensic investigation (SMF) to unveil the genuine identity of the offender by analysing the text written in OSN by the candidate culprits. Several challenges in AI dealing with OSN text, including limited text length and informal language full of internet jargon and grammatical errors that further impact AI's performance in SMF. The traditional AI system that analyses long text documents seems inadequate to analyse short OSN text's writing style. N-gram features are proven to efficiently represent the authors' writing style for shot text. However, representing N-grams in traditional representation like Tf-IDF resulted in sparse and difficult in grasping the semantic information from text. Besides, most AI works have been done in English but receive less attention in indigenous languages. In West Malaysia, the supreme languages that transcend ethnic boundaries are Iban of Sarawak and KadazanDusun of Sabah, which both are inherently under-resourced. This paper presented a proposed workflow of AI for short OSN text using two Under-Resourced Language (U-RL), Iban and KadazanDusun tweets, to curb the cyberbullying issue in Malaysia. This paper compares Tf-Idf (sparse) and SoA embedding-based (dense) feature representations to observe which representations best represent the stylistic features of the authors’ writing. N-grams of word, character, and POS were extracted as the features. The representation models were learned by different classifiers using machine learning (Naïve Bayes, Random Forest, and SVM). The convolutional neural network (CNN), a SoA deep learning model in sentence classification, was tested against the traditional classifiers. The result was observed by combining different representation models and classifiers on three datasets (English, Iban, and KadazanDusun). The best result was achieved when CNN learned embedding-based models with a combination of all features. KadazanDusun achieved the highest accuracy with 95.76%, English with 95.02%, and Iban with 94%..

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

Abstract

Talk to us

Similar Papers

More From: ASEAN Engineering Journal

Lead the way for us

Journal: ASEAN Engineering Journal	Publication Date: May 31, 2023
Citations: 1

Similar Papers

Author identification for Under-Resourced language (KadazanDusun)
Nursyahirah Tarmizi ... Suhaila Saee
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 17
Nursyahirah Tarmizi, et. al.Nursyahirah Tarmizi ... Suhaila Saee
01 Jan 2020
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 17

Author Profiling Approach: Predicting Personality Traits on Twitter Data using Combined BERT and SimCSE Embeddings
Kottu Divya Jyothi
International Journal for Research in Applied Science and Engineering Technology | VOL. 12
Kottu Divya JyothiKottu Divya Jyothi
30 Jun 2024
International Journal for Research in Applied Science and Engineering Technology | VOL. 12

A Deep Learning Approach for Author Profiling using Word Embeddings
Dr T Raghunadha Reddy ... S K Fayaz
International Journal for Research in Applied Science and Engineering Technology | VOL. 11
Dr T Raghunadha Reddy, et. al.Dr T Raghunadha Reddy ... S K Fayaz
31 May 2023
International Journal for Research in Applied Science and Engineering Technology | VOL. 11

Lithuanian Author Profiling with the Deep Learning
Jurgita Kapočiūtė-Dzikienė ... Robertas Damaševičius
-
Jurgita Kapočiūtė-Dzikienė, et. al.Jurgita Kapočiūtė-Dzikienė ... Robertas Damaševičius
26 Sep 2018
26 Sep 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

Abstract

Talk to us

Similar Papers

More From: ASEAN Engineering Journal