Abstract

Most of the text mining systems are based on statistical analysis of term frequency. The statistical analysis of term (phrase or word) frequency captures the importance of the term within a document, but the techniques that had been proposed by now still need to be improved in terms of their ability to detect the plagiarized parts, especially for capturing the importance of the term within a sentence. Two terms can have a same frequency in their documents, but one term pays more to the meaning of its sentences than the other term. In this paper, we want to discriminate between the important term and unimportant term in the meaning of the sentences in order to adopt for idea plagiarism detection. This paper introduces an idea plagiarism detection based on semantic meaning frequency of important terms in the sentences. The suggested method analyses and compares text based on a semantic allocation for each term inside the sentence. SRL offers significant advantages when generating arguments for each sentence semantically. Promising experimental has been applied on the CS11 dataset and results revealed that the proposed technique's performance surpasses its recent peer methods of plagiarism detection in terms of Recall, Precision and F-measure.

Highlights

  • Given the bigness of the online, plagiarism, or the intended use of somebody else’s original data while not acknowledge its supply, has been a heavy drawback in areas like Literature, Science, and Education

  • Several works had been done in text plagiarism detection based on the lexical and syntactic structure of the writing and failed to detect the semantic and idea plagiarism

  • Most of these methods are created for verbatim duplicates, and similarity performance is decreased when dealing with plagiarism with heavy cases [2], due to paraphrasing and semantic similarity cases

Read more

Summary

Introduction

Given the bigness of the online, plagiarism, or the intended use of somebody else’s original data while not acknowledge its supply, has been a heavy drawback in areas like Literature, Science, and Education. The challenge is exacerbated when the suspected text generated semantically, which is known as idea plagiarism It is not solely the extra problem of manually capturing the concept or idea performed, the people’s lack of information concerning writing ethical issues and text paraphrasing. Several works had been done in text plagiarism detection based on the lexical and syntactic structure of the writing and failed to detect the semantic and idea plagiarism. Most of these methods are created for verbatim duplicates, and similarity performance is decreased when dealing with plagiarism with heavy cases [2], due to paraphrasing and semantic similarity cases. Velásquez and et al [8]; Weber-Wulff [9])

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call