Document copy detection based on kernel method

Bao Jun-Peng Bao Jun-Peng,Liu Xiao-Dong Liu Xiao-Dong,Shen Jun-Yi Shen Jun-Yi,Liu Hai-Yan Liu Hai-Yan,Zhang Xiao-Di Zhang Xiao-Di

doi:10.1109/nlpke.2003.1275908

Document copy detection based on kernel method

Bao Jun-Peng Bao Jun-Peng, Liu Xiao-Dong Liu Xiao-Dong + Show 3 more

https://doi.org/10.1109/nlpke.2003.1275908

Copy DOI

Publication Date: Oct 26, 2003

Citations: 18

Affiliation: Xi'an Jiaotong University

#Word Sequence Kernel #String Kernel + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

We present semantic sequence kernel (SSK) to detect document plagiarism, which is derived from string kernel (SK) and word sequence kernel (WSK). SSK first finds out semantic sequences in documents, and then it uses a kernel function to calculate their similarity. SK and WSK only calculate the gap between the first word and the last one. SSK takes into account each common word's position information. We believe SSK contains both local and global information so that it makes a great progress in small partial plagiarism detection. We compare SSK with relative frequency model and semantic sequence model, which is a word frequency based model. The results show that SSK is excellent on nonrewording corpus. It is also valid on rewording corpus with some impairment on the performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.