Identifying cross-lingual plagiarism using rich semantic features and deep neural networks: A study on Arabic-English plagiarism cases

Salha Alzahrani,Hanan Aljuaid

doi:10.1016/j.jksuci.2020.04.009

Salha Alzahrani, Hanan Aljuaid

Open Access

https://doi.org/10.1016/j.jksuci.2020.04.009

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The rapid growth in the digital era initiates the need to inculcate and preserve the academic originality of translated texts. Cross-lingual semantic similarity is concerned with identifying the degree of similarity of textual pairs written in two different languages and determining whether they are plagiarized. Unlike existing approaches, which exploit lexical and syntax features for mono-lingual similarity, this work proposed rich semantic features extracted from cross-language textual pairs, including topic similarity, semantic role labeling, spatial role labeling, named entities recognition, bag-of-stop words, bag-of-meanings for all terms, n-most frequent terms, n-least frequent terms, and different sets of their combinations. Knowledge-based semantic networks such as BabelNet and WordNet were used for computing semantic relatedness across different languages. This paper attempts to investigate two tasks, namely, cross-lingual semantic text similarity (CL-STS) and plagiarism detection and judgement (PD) using deep neural networks, which, to the best of our knowledge, have not been implemented before for STS and PD in cross-lingual setting, and using such combination of features. For this purpose, we proposed different neural network architectures to solve the PD task as either binary classification (plagiarism/independently written), or even deeper classification (literally translated/paraphrased/summarized/independently written). Deep neural networks were also used as regressors to predict semantic connotations for CL-STS tasks. Experimental results were performed on a large number of handmade data taken from multiple sources consisting of 71,910 Arabic-English pairs. Overall, experimental results showed that using deep neural networks with rich semantic features achieves encouraging results in comparison to the baselines. The proposed classifiers and regressors tend to show comparable performances when using different architectures of neural networks, but both the binary and multi-class classifiers outperform the regressors. Finally, the evaluation and analysis of using different sets of features reflected the supremacy of deeper semantic features on the classification results.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of King Saud University - Computer and Information Sciences	Publication Date: Apr 17, 2020
Citations: 10	License type: cc-by-nc-nd

R Discovery Prime

Identifying cross-lingual plagiarism using rich semantic features and deep neural networks: A study on Arabic-English plagiarism cases

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences

Lead the way for us

Similar Papers

The power of deep learning to ligand-based novel drug discovery
Igor I Baskin
Expert Opinion on Drug Discovery | VOL. 15
Igor I BaskinIgor I Baskin
31 Mar 2020
Expert Opinion on Drug Discovery | VOL. 15

Joint Entity and Relation Extraction Network with Enhanced Explicit and Implicit Semantic Information
Huiyan Wu ... Jun Huang
Applied Sciences | VOL. 12
Huiyan Wu, et. al.Huiyan Wu ... Jun Huang
19 Jun 2022
Applied Sciences | VOL. 12

The 3-dimensional medical image recognition of right and left kidneys by deep GMDH-type neural network
Tadashi Kondo ... Shoichiro Takao
-
Tadashi Kondo, et. al.Tadashi Kondo ... Shoichiro Takao
01 Nov 2015
01 Nov 2015

A unified mathematical form for removing neurons based on orthogonal projection and crosswise propagation
Xun Liang ... Rong-Chang Chen
Neural Computing and Applications | VOL. 19
Xun Liang, et. al.Xun Liang ... Rong-Chang Chen
10 Nov 2009
Neural Computing and Applications | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Identifying cross-lingual plagiarism using rich semantic features and deep neural networks: A study on Arabic-English plagiarism cases

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences