Abstract

This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.