Abstract

The rapid growth in the digital era initiates the need to inculcate and preserve the academic originality of translated texts. Cross-lingual semantic similarity is concerned with identifying the degree of similarity of textual pairs written in two different languages and determining whether they are plagiarized. Unlike existing approaches, which exploit lexical and syntax features for mono-lingual similarity, this work proposed rich semantic features extracted from cross-language textual pairs, including topic similarity, semantic role labeling, spatial role labeling, named entities recognition, bag-of-stop words, bag-of-meanings for all terms, n-most frequent terms, n-least frequent terms, and different sets of their combinations. Knowledge-based semantic networks such as BabelNet and WordNet were used for computing semantic relatedness across different languages. This paper attempts to investigate two tasks, namely, cross-lingual semantic text similarity (CL-STS) and plagiarism detection and judgement (PD) using deep neural networks, which, to the best of our knowledge, have not been implemented before for STS and PD in cross-lingual setting, and using such combination of features. For this purpose, we proposed different neural network architectures to solve the PD task as either binary classification (plagiarism/independently written), or even deeper classification (literally translated/paraphrased/summarized/independently written). Deep neural networks were also used as regressors to predict semantic connotations for CL-STS tasks. Experimental results were performed on a large number of handmade data taken from multiple sources consisting of 71,910 Arabic-English pairs. Overall, experimental results showed that using deep neural networks with rich semantic features achieves encouraging results in comparison to the baselines. The proposed classifiers and regressors tend to show comparable performances when using different architectures of neural networks, but both the binary and multi-class classifiers outperform the regressors. Finally, the evaluation and analysis of using different sets of features reflected the supremacy of deeper semantic features on the classification results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.