Plagiarism detection on bigdata using modified map-reduced based SCAM algorithm

Jayshree Dwivedi,Abhigyan Tiwary

doi:10.1109/icimia.2017.7975533

Abstract

Plagiarism is one of the biggest problems of scientific research and engineering. Plagiarism is understood as presenting, intentionally or otherwise, someone else's words, thoughts, analyses, argumentation, pictures, techniques, computer programmers etc. Plagiarism has a wider meaning, paraphrasing someone else's texts by replacing a few words by synonyms or interchanging some sentences in own way is also plagiarism. Even reproducing in your own words a reasoning or analysis made by someone else may constitute plagiarism if you do not add any content of your own; in so doing, you create the opinion that you have invented the argumentation yourself while this is not the case. The same still applies if you bring together bits of work by various authors without mentioning the sources. Plagiarism has also increased with the use of internet and large amount of big data available. Plagiarism detection techniques are applied by making a distinction between natural and programming languages. A similarity score is determined for each pair of documents which match significantly. We have a SCAM (Standard Copy Analysis Mechanism) plagiarism detection algorithm which calculates relative measure to detect overlap by making comparison on asset of words that are common between test document and registered document. Our proposed detection process is based on natural language by comparing documents. We have implemented Map-Reduce based SCAM algorithm for processing big data using Hadoop and detect plagiarism in big data. Normal Scam algorithm is suitable for normal data processing not for big data processing.

Full Text