Improving code quality is one of the most significant issues in the software industry. Deep learning is an emerging area of research for detecting code smells and addressing refactoring requirements. The aim of this study is to develop a deep learning-based system for code modification analysis to predict the locations and types of code modifications, while significantly reducing the need for manual labeling. We created an experimental dataset by collecting historical code data from open-source project repositories on the Internet. We introduce a novel class-level abstract syntax tree-based code embedding method for code analysis. A recurrent neural network was employed to effectively identify code modification requirements. Our system achieves an average accuracy of approximately 83% across different repositories and 86% for the entire dataset. These findings indicate that our system provides higher performance than the method-based and text-based code embedding approaches. In addition, we performed a comparative analysis with a static code analysis tool to justify the readiness of the proposed model for deployment. The correlation coefficient between the outputs demonstrates a significant correlation of 67%. Consequently, this research highlights that the deep learning-based analysis of code histories empowers software teams in identifying potential code modification requirements.
Read full abstract