Abstract

Detailed comparison is one important sub-task of external plagiarism detection. Seed heuristic between two documents is often used in this task. Vector space model (VSM) and Jaccard coefficient are commonly used in plagiarism detection. VSM can produce high recall performance; Jaccard coefficient can produce high precision performance. In this paper, we propose a hybrid similarity measure model on the basis of the fitting function of the optimal dividing line between plagiarism and none-plagiarism where we integrates VSM and Jaccard coefficient into a unified one, our method make full use of the advantage of VSM and the Jaccard coefficient, and it can extract more reasonable heuristic seeds in the plagiarism detection. Our method is evaluated at PAN corpus of CLEF (Cross-Language Evaluation Forum) and compared with the methods based on VSM or Jaccard coefficient. Experimental results show our method can produce better performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call