Assessing lexical similarity between short sentences of source code based on granularity

Harpreet Kaur,Raman Maini

doi:10.1007/s41870-018-0213-1

Abstract

Detecting similarity between two source code bases or inside one code base has many applications in the area of plagiarism detection and reused code which is manageable for refactoring. In this paper, State of the art techniques: Levenshtein Distance, Cosine Similarity, Hamming Distance and ASCII based hashing and Rabin–Karp rolling hashing have been investigated on source code strings, which is an extended work to already published research work. From experimentation, it has been observed that Rabin–Karp hashing performs better than other techniques in terms of running time, accuracy and type-of-clones. All techniques face one issue of increase in similarity searching time linearly with database size, whereas Rabin–Karp hashing handled this issue efficiently. Moreover, Rabin–Karp rolling hash method reported minimum false positives and it is also able to manage multiple patterns at a time.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessing lexical similarity between short sentences of source code based on granularity

Abstract

Talk to us

Similar Papers

More From: International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Lead the way for us

Journal: International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management	Publication Date: Aug 1, 2018
Citations: 5

Similar Papers

Granularity-Based Assessment of Similarity Between Short Text Strings
Harpreet Kaur ... Raman Maini
-
Harpreet Kaur, et. al.Harpreet Kaur ... Raman Maini
01 Jan 2019
01 Jan 2019

Transition-Sensitive Distances
Kaoru Yoshida
-
Kaoru YoshidaKaoru Yoshida
01 Jan 2014
01 Jan 2014

Bug localization using latent Dirichlet allocation
Stacy K Lukins ... Letha H Etzkorn
Information and software technology | VOL. 52
Stacy K Lukins, et. al.Stacy K Lukins ... Letha H Etzkorn
22 Apr 2010
Information and software technology | VOL. 52

An illustrative example of refactoring object‐oriented source code with aspect‐oriented mechanisms
Miguel P Monteiro ... João M Fernandes
Software - Practice and Experience | VOL. 38
Miguel P Monteiro, et. al.Miguel P Monteiro ... João M Fernandes
21 Aug 2007
Software - Practice and Experience | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing lexical similarity between short sentences of source code based on granularity

Abstract

Talk to us

Similar Papers

More From: International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management