Experiments with Filtered Detection of Similar Academic Papers

Yaakov Hacohen-Kerner,Aharon Tayeb

doi:10.1007/978-3-642-33185-5_1

Abstract

In this research, we investigate the issue of efficient detection of similar academic papers. Given a specific paper, and a corpus of academic papers, most of the papers from the corpus are filtered out using a fast filter method. Then, 47 methods (baseline methods and combinations of them) are applied to detect similar papers, where 34 of the methods are variants of new methods. These 34 methods are divided into three new method sets: rare words, combinations of at least two methods, and compare methods between portions of the papers. Results achieved by some of the 34 heuristic methods are better than the results of previous heuristic methods, comparing to the results of the “Full Fingerprint” (FF) method, an expensive method that served as an expert. Nevertheless, the run time of the new methods is much more efficient than the run time of the FF method. The most interesting finding is a method called CWA(1) that computes the frequency of rare words that appear only once in both compared papers. This method has been found as an efficient measure to check whether two papers are similar.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Experiments with Filtered Detection of Similar Academic Papers

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Rapid detection of similar peer-reviewed scientific papers via constant number of randomized fingerprints
Yaakov Hacohen-Kerner ... Aharon Tayeb
Information Processing and Management | VOL. 53
Yaakov Hacohen-Kerner, et. al.Yaakov Hacohen-Kerner ... Aharon Tayeb
09 Jul 2016
Information Processing and Management | VOL. 53

A Technique to Find Out Low Frequency Rare Words in Medical Cancer Text Document Classification
Falguni N Patel ... Shishir Shah
-
Falguni N Patel, et. al.Falguni N Patel ... Shishir Shah
01 Jan 2021
01 Jan 2021

KAFE: Knowledge and Frequency Adapted Embeddings
Awais Ashfaq ... Markus Lingman
-
Awais Ashfaq, et. al.Awais Ashfaq ... Markus Lingman
01 Jan 2021
01 Jan 2021

The Effect of Word Frequency on Answering Grammar Questions
Zahra Kordjazi
Journal of Language Teaching and Research | VOL. 2
Zahra KordjaziZahra Kordjazi
01 Nov 2011
Journal of Language Teaching and Research | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Experiments with Filtered Detection of Similar Academic Papers

Abstract

Talk to us

Similar Papers