Detection of code clones is necessary for ensuring high code quality, byte-level security, preserving intellectual property rights, and incorporating various compliance measures. Existing clone detection models moreover showcase higher complexity otherwise has lower efficiency when evaluated on large code bases. Moreover, most of these models only consider syntactical checking, which makes them inapplicable for cross-project analysis. To conquer these matter, this text suggests intend of a competent novel pattern analysis model for identification of code clones via augmented deep learning process that uses UML (Unified Modelling Language) based information sets. The proposed model is trained on different UML class diagram components that include methods, classes, and attributes, relationships between classes, their associations, dependency levels, realizations, multiplicity instances and interface patterns. All these pattern information sets are aggregated, and processed by an Ant Lion Optimizer (ALO), which helps to analyze very different processes. The selected collection is divided into ‘clone’, and ‘original’ classes by modified a one-dimensional Convolutional Neural Network (CNN), which helps to evaluate the degree of cloning probability. Due to evaluation of UML metrics, the proposed model can be scaled to cross project & cross language deployments. The proposed model was tested on GPT-J Code Clone Detection Dataset, Code Glue Dataset, and Smart Embed Code Clone Analysis Datasets. It was anticipated that the suggested model was able to improve precision of code clone detection by 5.9%, precision by 2.3%, and recall by 4.5% when compared with existing methods on similar data samples.
Read full abstract