A universal cross language software similarity detector for open source software categorization

Kawser Wazed Nafi,Banani Roy,Chanchal K Roy,Kevin A Schneider

doi:10.1016/j.jss.2019.110491

Abstract

While there are novel approaches for detecting and categorizing similar software applications, previous research focused on detecting similarity in applications written in the same programming language and not on detecting similarity in applications written in different programming languages. Cross-language software similarity detection is inherently more challenging due to variations in language, application structures, support libraries used, and naming conventions. In this paper we propose a novel model, CroLSim, to detect similar software applications across different programming languages. We define a semantic relationship among cross-language libraries and API methods (both local and third party) using functional descriptions and a word-vector learning model. Our experiments show that CroLSim can successfully detect cross-language similar software applications, which outperforms all existing approaches (mean average precision rate of 0.65, confidence rate of 3.6, and 75% highly rated successful queries). Furthermore, we applied CroLSim to a source code repository to see whether our model can recommend cross-language source code fragments if queried directly with source code. From our experiments we found that CroLSim can recommend cross-language functional similar source code when source code is directly used as a query (average precision=0.28, recall=0.85, and F-Measure=0.40).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A universal cross language software similarity detector for open source software categorization

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software

Lead the way for us

Journal: Journal of Systems and Software	Publication Date: Dec 4, 2019
Citations: 13

Similar Papers

In Search of the Original Fortran Compiler
Paul Mcjones
IEEE Annals of the History of Computing | VOL. 39
Paul McjonesPaul Mcjones
01 Jan 2017
IEEE Annals of the History of Computing | VOL. 39

Flowchart-Based Cross-Language Source Code Similarity Detection
Feng Zhang ... Qian Song
Scientific Programming | VOL. 2020
Feng Zhang, et. al.Feng Zhang ... Qian Song
17 Dec 2020
Scientific Programming | VOL. 2020

Research Paper] CroLSim: Cross Language Software Similarity Detector Using API Documentation
Kawser Wazed Nafi ... Chanchal K Roy
-
Kawser Wazed Nafi, et. al.Kawser Wazed Nafi ... Chanchal K Roy
01 Sep 2018
01 Sep 2018

Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning
Elife Ozturk Kiyak ... Kokten Ulas Birant
SN Computer Science | VOL. 1
Elife Ozturk Kiyak, et. al.Elife Ozturk Kiyak ... Kokten Ulas Birant
14 Aug 2020
SN Computer Science | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A universal cross language software similarity detector for open source software categorization

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software