Abstract

Abstract: Software replication stemming from code reutilization introduces complexities in software maintenance. While deep learning-driven clone detection tools abound, they predominantly cater to singular programming languages. Concurrently, the task of identifying algorithms within program code remains challenging due to the absence of metadata, rendering the selection process intricate. This research seeks to establish synergies between these disparate domains by advancing innovative methodologies. The study delves into the unexplored domain of harnessing data augmentation, particularly leveraging transcompiler-based techniques, to enhance the discernment of code clones across different programming languages. Drawing upon the insights derived from transcompilers, this investigation applies data augmentation through source-to-source translation, exemplified by the Transcoder. This method extends the applicability of single-language models, such as the Graph Matching Network (GMN), to encompass cross-language detection. Concurrently, a program code classification model is introduced, leveraging Convolutional Neural Networks (CNNs) to discern algorithms based on structural features (SFs). These SFs are extracted from program codes and subsequently transformed into a one-hot binary matrix. The CNN model undergoes meticulous fine-tuning, optimizing its structural configurations and hyperparameters for superior algorithm classification. This research signifies an integration of disparate realms, intertwining cross-language clone detection and algorithm identification within a unified framework. The exploration of data augmentation for cross-language clone detection and the employment of CNN-GMNs for algorithm identification converge to furnish significant contributions to both domains. These insights, thus, hold promise for advancing the fields of software engineering and programming education

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call