Detecting refactoring type of software commit messages based on ensemble machine learning algorithms

Dimah Al-Fraihat,Yousef Sharrab,Abdel-Rahman Al-Ghuwairi,Nour Sbaih,Ayman Qahmash

doi:10.1038/s41598-024-72307-0

Abstract

Refactoring is a well-established topic in contemporary software engineering, focusing on enhancing software's structural design without altering its external behavior. Commit messages play a vital role in tracking changes to the codebase. However, determining the exact refactoring required in the code can be challenging due to various refactoring types. Prior studies have attempted to classify refactoring documentation by type, achieving acceptable results in accuracy, precision, recall, F1-Score, and other performance metrics. Nevertheless, there is room for improvement. To address this, we propose a novel approach using four ensemble Machine Learning algorithms to detect refactoring types. Our experimentation utilized a dataset containing 573 commits, with text cleaning and preprocessing applied to address data imbalances. Various techniques, including hyperparameter optimization, feature engineering with TF-IDF and bag-of-words, and binary transformation using one-vs-one and one-vs-rest classifiers, were employed to enhance accuracy. Results indicate that the experiment involving feature engineering using the TF-IDF technique outperformed other methods. Notably, the XGBoost algorithm with the same technique achieved superior performance across all metrics, attaining 100% accuracy. Moreover, our results surpass the current state-of-the-art performance using the same dataset. Our proposed approach bears significant implications for software engineering, particularly in enhancing the internal quality of software.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Detecting refactoring type of software commit messages based on ensemble machine learning algorithms

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Journal: Scientific Reports	Publication Date: Sep 12, 2024
License type: cc-by-nc-nd

Similar Papers

Spam Mail Classification Using Ensemble and Non-Ensemble Machine Learning Algorithms
Khyati Agarwal ... Varun Dutt
-
Khyati Agarwal, et. al.Khyati Agarwal ... Varun Dutt
23 Oct 2020
23 Oct 2020

Detection of COVID-19 Using Textual Clinical Data: A Machine Learning Approach
Reenu Batra ... Virendra Kumar Shrivastava
-
Reenu Batra, et. al.Reenu Batra ... Virendra Kumar Shrivastava
01 Jan 2020
01 Jan 2020

RR Interval-based Atrial Fibrillation Detection using Traditional and Ensemble Machine Learning Algorithms.
Sk Shrikanth Rao ... Roshan Joy Martis
Journal of Medical Signals & Sensors | VOL. 13
Sk Shrikanth Rao, et. al.Sk Shrikanth Rao ... Roshan Joy Martis
01 Feb 2023
Journal of Medical Signals & Sensors | VOL. 13

Spatial Modeling of Asthma-Prone Areas Using Remote Sensing and Ensemble Machine Learning Algorithms
Seyed Vahid Razavi-Termeh ... Soo-Mi Choi
Remote Sensing | VOL. 13
Seyed Vahid Razavi-Termeh, et. al.Seyed Vahid Razavi-Termeh ... Soo-Mi Choi
13 Aug 2021
Remote Sensing | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting refactoring type of software commit messages based on ensemble machine learning algorithms

Abstract

Talk to us

Similar Papers

More From: Scientific Reports