Abstract

Source Code Authorship Attribution (SCAA) is a direct challenge to the privacy and anonymity of developers. However, it is important to recognize the malicious authors and the origin of the attack. In this paper, we proposed Source Code Authorship Attribution using Abstract Syntax Tree (SCAA-AST) for efficient classification of programmers. First, the AST hierarchal features are generated from different programming codes. Second, preprocessing techniques are used to obtain useful features without sound data. Third, the Term Frequency Inverse Document Frequency (TFIDF) weighting technique is used to zoom in on the significance of each feature. Fourth, the Adaptive Synthetic (ADASYN) oversampling method is used to solve the imbalanced class problem. Finally, a deep learning algorithm is designed with the TensorFlow framework, and the Keras API is used to classify programming authors. A deep learning algorithm is further configured with a dropout layer, learning error rate, loss and activation function, and dense layers to enhance the classification results. The results are appreciable in outperforming the existing techniques from the perspective of classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call