Abstract
Understanding how developers refactor their code is critical to support the design improvement process of software. This paper investigates to what extent code metrics are good indicators for predicting refactoring activity in the source code. In order to perform this, we formulated the prediction of refactoring operation types as a multi-class classification problem. Our solution relies on measuring metrics extracted from committed code changes in order to extract the corresponding features (i.e., metric variations) that better represent each class (i.e., refactoring type) in order to automatically predict, for a given commit, the method-level type of refactoring being applied, namely Move Method, Rename Method, Extract Method, Inline Method, Pull-up Method, and Push-down Method. We compared various classifiers, in terms of their prediction performance, using a dataset of 5004 commits and extracted 800 Java projects. Our main findings show that the random forest model trained with code metrics resulted in the best average accuracy of 75%. However, we detected a variation in the results per class, which means that some refactoring types are harder to detect than others.
Highlights
Refactoring is the practice of improving software internal design without altering its external behavior
We implemented a model with only commit messages as input, but this approach led us to more research with other inputs
Combining commit messages with code metrics was our second experiment, and the model built with LSTM produced 54.3% of accuracy
Summary
Refactoring is the practice of improving software internal design without altering its external behavior. Developers regularly refactor their code by performing various refactoring types, including splitting methods, renaming attributes, moving classes, and merging packages. Recent studies have been focusing on recommending appropriate refactoring types in response to poor code design [1,2,3,4] and analyzing how developers refactor code by making mining code changes and commit messages [5,6,7,8,9]. Empirical studies have been focused on mining commit messages to extract developers’ intents behind refactoring in terms of optimizing structural metrics (e.g., coupling, complexity, etc.) [10,11] and quality attributes (e.g., reuse, etc.) [12,13]. Our model can learn to detect patterns in metrics values, which can be later combined with textual information in order to support the accurate distinction the refactoring types (move, extract, inline, etc.)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.