Abstract

As the Internet help us cross language and cultural border and with different types of translation tools, cross language plagiarism is bound to rise. Besides that, semantic plagiarism, where the student reconstructs the sentence or changes some terms into its corresponding synonyms, also raises concerns in the academic field. Both of this plagiarism is hardly detected due to the difference in their fingerprints. Plagiarism detection tools available are not capable to detect such plagiarism cases. In this research, we propose a new approach in detecting both cross language and semantic plagiarism. We consider Bahasa Melayu as the input language of the submitted document and English as a target language of similar, possibly plagiarised documents. In this system we shorten the query document by utilising fuzzy swarm-based summarisation approach. Our point of view is that using the summary will give us the most important keywords in the document. Input summary documents are translated into English using Google Translate Application Programming Interface (API) before the words are stemmed and the stop words are removed. Tokenized documents are sent to the Google AJAX Search API to detect similar documents throughout the World Wide Web. We integrate the use of Stanford Parser and Word Net to determine the semantic similarity level between the suspected documents with candidate source documents. Stanford parser assigns each terms in the sentence to their corresponding roles such as Nouns, Verbs and Adjectives. Based on these roles, we represent each sentence in a predicate form and similarity is measured based on those predicates using information content value from Word Net taxonomy. Our testing dataset is built up from two sets of Malay documents which are produced based on different plagiarism techniques. The result of our proposed semantic based similarity measurement shows that it can achieve higher precision, recall and F-Measure compared to the conventional Longest Common Subsequence (LCS) approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.