Code Comments: A Way of Identifying Similarities in the Source Code

Rares Folea,Emil Slusanschi

doi:10.3390/math12071073

Abstract

This study investigates whether analyzing the code comments available in the source code can effectively reveal functional similarities within software. The authors explore how both machine-readable comments (such as linter instructions) and human-readable comments (in natural language) can contribute towards measuring the code similarity. For the former, the work is relying on computing the cosine similarity over the one-hot encoded representation of the machine-readable comments, while for the latter, the focus is on detecting similarities in English comments, using threshold-based computations against the similarity measurements obtained using models based on Levenshtein distances (for form-based matches), Word2Vec (for contextual word representations), as well as deep learning models, such as Sentence Transformers or Universal Sentence Encoder (for semantic similarity). For evaluation, this research has analyzed the similarities between different source code versions of the open-source code editor, VSCode, based on existing ESlint-specific directives, as well as applying natural language processing techniques on incremental releases of Kubernetes, an open-source system for automating containerized application management. The experiments outlines the potential for detecting code similarities solely based on comments, and observations indicate that models like Universal Sentence Encoder are providing a favorable balance between recall and precision. This research is integrated into Project Martial, an open-source project for automatic assistance in detecting plagiarism in software.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematics	Publication Date: Apr 2, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Code Comments: A Way of Identifying Similarities in the Source Code

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

Detecting autism from picture book narratives using deep neural utterance embeddings.
Aleksander Wawer ... Izabela Chojnicka
International Journal of Language & Communication Disorders | VOL. 57
Aleksander Wawer, et. al.Aleksander Wawer ... Izabela Chojnicka
12 May 2022
International Journal of Language & Communication Disorders | VOL. 57

Hierarchical Neural Network Approaches for Long Document Classification
Snehal Ishwar Khandve ... Vedangi Kishor Wagh
-
Snehal Ishwar Khandve, et. al.Snehal Ishwar Khandve ... Vedangi Kishor Wagh
18 Feb 2022
18 Feb 2022

MULTILINGUAL TEXT CLASSIFIER USING PRE-TRAINED UNIVERSAL SENTENCE ENCODER MODEL
O V Orlovskiy ... Khalili Sohrab
Radio Electronics, Computer Science, Control | VOL. -
O V Orlovskiy, et. al.O V Orlovskiy ... Khalili Sohrab
16 Oct 2022
Radio Electronics, Computer Science, Control | VOL. -

A Sentence-Embedding-Based Dashboard to Support Teacher Analysis of Learner Concept Maps
Filippo Sciarrone ... Marco Temperini
Electronics | VOL. 13
Filippo Sciarrone, et. al.Filippo Sciarrone ... Marco Temperini
02 May 2024
Electronics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Code Comments: A Way of Identifying Similarities in the Source Code

Abstract

Talk to us

Similar Papers

More From: Mathematics