Analysis of machine learning methods in the task of searching duplicates in the software code

Tetiana Kaliuzhna,Yevhenii Kubiuk

doi:10.15587/2706-5448.2022.263235

Tetiana Kaliuzhna, Yevhenii Kubiuk

https://doi.org/10.15587/2706-5448.2022.263235

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The object of the study is code in the Python programming language, analyzed by machine learning methods to identify clones. This work is devoted to the study of machine learning methods and implementation of the decision tree machine learning model in the problem of finding clones in the program code. The paper also analyzes existing machine learning approaches for detecting duplicates in program code. During the comparison, the advantages and disadvantages of each algorithm were determined, and the results were summarized in the corresponding comparison tables. As a result of the analysis, it was determined that the method based on the decision tree, which gives the best result in the task of finding clones in the program code, is the most optimal both from the point of view of accuracy and from the point of view of implementation. The result of the work is a created model that, with an accuracy of more than 99 %, classifies cloned and non-cloned codes on an automatically generated dataset in a minimal amount of time. This system has several open questions for future research, the list of which is presented in this work. The proposed model has the following ways of further development: – recognition of clones rewritten from one programming language to another; – detection of vulnerabilities in the code; – improvement of model performance by creating more universal datasets. The perspective of the work lies in training a decision tree model for accurate and fast detection of code clones, which can potentially be widely used for plagiarism detection in both educational institutions and IT companies.

Full Text

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Analysis of machine learning methods in the task of searching duplicates in the software code

Abstract

Talk to us

Similar Papers

More From: Technology audit and production reserves

Lead the way for us

Journal: Technology audit and production reserves	Publication Date: Aug 26, 2022
License type: cc-by

Similar Papers

Error analysis of machine learning methods as the educational background for its use skills formation
E V Slavutskaya ... L A Slavutskii
Vestnik of Minin University | VOL. 12
E V Slavutskaya, et. al.E V Slavutskaya ... L A Slavutskii
20 Jun 2024
Vestnik of Minin University | VOL. 12

Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table.
Christoffer Dharma ... Michael Chaiton
International Journal of Environmental Research and Public Health | VOL. 20
Christoffer Dharma, et. al.Christoffer Dharma ... Michael Chaiton
21 Jun 2023
International Journal of Environmental Research and Public Health | VOL. 20

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Research Methods in Machine Learning: A Content Analysis
Mayank Shukla
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 06
Mayank ShuklaMayank Shukla
03 May 2022
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 06

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Analysis of machine learning methods in the task of searching duplicates in the software code

Abstract

Talk to us

Similar Papers

More From: Technology audit and production reserves