Abstract
This paper proposes an algorithm for document plagiarism detection using the provided incremental knowledge construction with formal concept analysis (FCA). The incremental knowledge construction is presented to support document matching between the source document in storage and the suspect document. Thus, a new concept similarity measure is also proposed for retrieving formal concepts in the knowledge construction. The presented concept similarity employs appearance frequencies in the obtained knowledge construction. Our approach can be applied to retrieve relevant information because the obtained structure uses FCA in concept form that is definable by a conjunction of properties. This measure is mathematically proven to be a formal similarity metric. The performance of the proposed similarity measure is demonstrated in document plagiarism detection. Moreover, this paper provides an algorithm to build the information structure for document plagiarism detection. Thai text test collections are used for performance evaluation of the implemented web application.
Highlights
Plagiarism has increased because of easy access to data on the World Wide Web
Ekbal et al [40] propose a technique based on textual similarity for external plagiarism detection by using a vector space model, which is one technique in information retrieval (IR) to compare source and suspect documents
The document plagiarism detection using Formal concept analysis (FCA) is aimed at detecting good matches between the source document in storage and a suspect document
Summary
Plagiarism has increased because of easy access to data on the World Wide Web. This work applied FCA to detect document plagiarism This method provides related documents or groups of documents to the user. The application requires a similarity measure to retrieve source documents or to identify groups of similar documents in a concept hierarchy. Concept similarity of FCA has gained importance from its application to plagiarism detection, which has to assess the similarity between formal concepts to find relevant information. We present and investigate a candidate algorithm to support plagiarism detection with the proposed concept similarity measures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.