Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus

Bela Gipp,Norman Meuschke,Corinna Breitinger

doi:10.1002/asi.23228

Abstract

The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character‐based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structural and idea plagiarism, remain undetected. A recently proposed language‐independent approach to plagiarism detection, Citation‐based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character‐based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation‐based approach achieves superior ranking performance for heavily disguised plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character‐based approaches. Finally, upon combining the citation‐based with the traditional character‐based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus

Abstract

Talk to us

Similar Papers

More From: Journal of the Association for Information Science and Technology

Lead the way for us

Journal: Journal of the Association for Information Science and Technology	Publication Date: Jun 4, 2014
Citations: 41

Similar Papers

Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space
Norman Meuschke ... Bela Gipp
-
Norman Meuschke, et. al.Norman Meuschke ... Bela Gipp
01 Sep 2014
01 Sep 2014

Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space
...
-
, et. al. ...
08 Sep 2014
08 Sep 2014

Editorial 9(1)
Tracey Bretag
International Journal for Educational Integrity | VOL. 9
Tracey BretagTracey Bretag
06 Jun 2013
International Journal for Educational Integrity | VOL. 9

Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
Bela Gipp ... Norman Meuschke
-
Bela Gipp, et. al.Bela Gipp ... Norman Meuschke
13 Jun 2011
13 Jun 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus

Abstract

Talk to us

Similar Papers

More From: Journal of the Association for Information Science and Technology