Refactoring is a critical process of improving the internal structure of the source code without altering its external behavior. Existing deep learning-based refactoring detection relies on commit messages to extract features. However, these commit messages are not trustful enough since some developers do not consistently record refactoring activities. Furthermore, current approaches are designed for a single programming language and lack multilingual refactoring support. To this end, this paper proposes RefT5, a multilingual code refactoring detection approach based on deep learning. Firstly, we select 110 real-world projects with Java and Python programming languages as a corpus to construct the dataset. Secondly, we extract features including commit messages, code changes, and refactoring types from these projects. RefT5 generates edit sequences from code changes and takes refactoring types as labels. Thirdly, we employ CodeT5 and BiLSTM-attention to extract semantic and structural features and generate feature vectors. Finally, the feature vectors are input into a classification layer to detect the refactoring type. The experimental results show that RefT5 obtains 98.05% precision and 97.77% recall. Furthermore, compared with existing approaches, it improves precision by 51.61% and recall by 52.9% on average, demonstrating its effectiveness.
Read full abstract