Abstract

We present semantic sequence kernel (SSK) to detect document plagiarism, which is derived from string kernel (SK) and word sequence kernel (WSK). SSK first finds out semantic sequences in documents, and then it uses a kernel function to calculate their similarity. SK and WSK only calculate the gap between the first word and the last one. SSK takes into account each common word's position information. We believe SSK contains both local and global information so that it makes a great progress in small partial plagiarism detection. We compare SSK with relative frequency model and semantic sequence model, which is a word frequency based model. The results show that SSK is excellent on nonrewording corpus. It is also valid on rewording corpus with some impairment on the performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.