Abstract

Code clone detection helps to reduce the costs associated with software maintenance and bug prevention. Machine learning methods have previously suggested many ways by which to detect code clones. The majority of clone detectors are traditional in their approach, they can detect syntactic clones but are poor at detecting semantic clones. Researchers use machine learning to detect semantic clones and automatically scan the data to learn latent semantic features. In this study, we have introduced a new formal model of similarity which combines similarity measures so that method blocks can measure both the syntactic and semantic distances between method block pairs. The uniqueness of our study is in the use of different similarity measures, and similarity scores as features in machine learning, to detect code clones. We use a number of similarity measure computations to extract similarity score features, these features are then represented as vectors. Using ensemble classification models, we perform extensive comparisons and evaluations of the effectiveness of our proposed idea. The results indicate that our approach is significantly better at detecting clone types compared to contemporary code clone detectors. We achieved a 99% success rate in detecting cloned codes based on F-score, recall, and precision. Our approach achieves 98–100% accuracy in the majority of cases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.