Abstract

Day after day the cases of plagiarism increase and become a crucial problem in the modern world caused by the quantity of textual information available in the web. Data mining becomes the foundation for many different domains as one of its chores is the text categorization, which can be used in order to resolve the impediment of automatic plagiarism detection. This article is devoted to a new approach for combating plagiarism named MML (Multi-agents Machine learning system) and is composed of three modules: data preparation and digitalization, using n-gram character or bag of words as methods for the text representation; TF*IDF as weighting to calculate the importance of each term in the corpus in order to transform each document to a vector; and learning and voting phase using three supervised learning algorithms (decision tree c4.5, naïve Bayes and support vector machine).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.