Cross-language plagiarism detection

Martin Potthast,Alberto Barrón-Cedeño,Benno Stein,Paolo Rosso

doi:10.1007/s10579-009-9114-z

Abstract

Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism detection, (2) state-of-the-art solutions for two important subtasks are reviewed, (3) retrieval models for the assessment of cross-language similarity are surveyed, and, (4) the three models CL-CNG, CL-ESA and CL-ASA are compared. Our evaluation is of realistic scale: it relies on 120,000 test documents which are selected from the corpora JRC-Acquis and Wikipedia, so that for each test document highly similar documents are available in all of the six languages English, German, Spanish, French, Dutch, and Polish. The models are employed in a series of ranking tasks, and more than 100 million similarities are computed with each model. The results of our evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related. CL-ESA almost matches the performance of CL-CNG, but on arbitrary pairs of languages. CL-ASA works best on “exact” translations but does not generalize well.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Language Resources and Evaluation	Publication Date: Jan 30, 2010
Citations: 221	License type: other-oa

R Discovery Prime

R Discovery Prime

Cross-language plagiarism detection

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Similar Papers

Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language
Marc Franco-Salvador ... Rafael E Banchs
Knowledge-Based Systems | VOL. 111
Marc Franco-Salvador, et. al.Marc Franco-Salvador ... Rafael E Banchs
06 Aug 2016
Knowledge-Based Systems | VOL. 111

Analysis on the Effect of Term-Document's Matrix to the Accuracy of Latent-Semantic-Analysis-Based Cross-Language Plagiarism Detection
Anak Agung Putri Ratna ... Prima Dewi Purnamasari
-
Anak Agung Putri Ratna, et. al.Anak Agung Putri Ratna ... Prima Dewi Purnamasari
17 Dec 2016
17 Dec 2016

PlagAL: Plagiarism detection system for Albanian texts
Lamir Shkurti ... Jaumin Ajdari
-
Lamir Shkurti, et. al.Lamir Shkurti ... Jaumin Ajdari
07 Jun 2021
07 Jun 2021

Cross-lingual plagiarism detection techniques for English-Hindi language pairs
Basant Agarwal
Journal of Discrete Mathematical Sciences and Cryptography | VOL. 22
Basant AgarwalBasant Agarwal
19 May 2019
Journal of Discrete Mathematical Sciences and Cryptography | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-language plagiarism detection

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation