Abstract

Abstract In order to perform code clone detection in missing source code scenarios while ensuring the code clone detection effect, this paper proposes a code clone detection method based on bytecode and twin neural networks. The process begins by extracting the function’s opcode sequence from the bytecode instruction file. Then, the opcodes are vectorized using a neural network pre-training model to ensure that they contain semantic information. Then, a twin neural network is constructed based on GRU to compute the similarity between the vector sequences. The Opcode21K dataset dedicated to bytecode is used to test the constructed model. A total of 5818611 real clone pairs and 279112 fake clone pairs are detected, and the clone pairs that have been labeled by Opcode21K are plotted on the ROC curve so as to select the distance value of 0.7 as the code clone detection threshold. The number of clone pairs detected by SJBCD, the accuracy, and the recall rate are much higher than those of most existing methods. The number of large-difference code clones detected ranges from about 20% to 50% of the total clones. Additionally, the method’s runtime is the shortest for datasets with code lines ranging from 1M to 30M in size, and the detection time for a 250M dataset is approximately 54.5 hours. Therefore, the algorithm constructed in this study can take into account the detection of code clones in a variety of situations so that the efficiency of software development can be effectively improved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.