The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

Leilei Kong ,Zhongyuan Han ,Haoliang Qi

doi:10.3837/tiis.2019.09.026

Abstract

Plagiarism detection is increasingly exploiting text alignment. Text alignment involves extracting the plagiarism passages in a pair of the suspicious document and its source document. The heuristics have achieved excellent performance in text alignment. However, the further improvements of the heuristic methods mainly depends more on the experiences of experts, which makes the heuristics lack of the abilities for continuous improvements. To address this problem, machine learning maybe a proper way. Considering the position relations and the context of text segments pairs, we formalize the text alignment task as a problem of sequence labeling, improving the current methods at the model level. Especially, this paper proposes to use the probabilistic graphical model to tag the observed sequence of pairs of text segments. Hence we present the sequence labeling approach for text alignment in plagiarism detection based on Conditional Random Fields. The proposed approach is evaluated on the PAN@CLEF 2012 artificial high obfuscation plagiarism corpus and the simulated paraphrase plagiarism corpus, and compared with the methods achieved the best performance in PAN@CLEF 2012, 2013 and 2014. Experimental results demonstrate that the proposed approach significantly outperforms the state of the art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems

Lead the way for us

Similar Papers

A Method of Plagiarism Source Retrieval and Text Alignment Based on Relevance Ranking Model
Leilei Kong ... Haoliang Qi
International Journal of Database Theory and Application | VOL. 9
Leilei Kong, et. al.Leilei Kong ... Haoliang Qi
31 Dec 2016
International Journal of Database Theory and Application | VOL. 9

Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition
Miguel A Sanchez-Perez ... Grigori Sidorov
-
Miguel A Sanchez-Perez, et. al.Miguel A Sanchez-Perez ... Grigori Sidorov
01 Jan 2015
01 Jan 2015

Predicting Type of Obfuscation to Enhance Text Alignment Algorithms
Fatemeh Mashhadirajab ... Mehrnoush Shamsfard
-
Fatemeh Mashhadirajab, et. al.Fatemeh Mashhadirajab ... Mehrnoush Shamsfard
01 Jan 2018
01 Jan 2018

Instructor-centric source code plagiarism detection and plagiarism corpus
Jonathan Y.H Poon ... Min-Yen Kan
-
Jonathan Y.H Poon, et. al.Jonathan Y.H Poon ... Min-Yen Kan
03 Jul 2012
03 Jul 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems