Improving the accuracy of recurrent neural networks models in predicting software bug based on undersampling methods

Nasraldeen Alnor Adam Khleel,Károly Nehéz

doi:10.11591/ijeecs.v32.i1.pp478-493

Nasraldeen Alnor Adam Khleel, Károly Nehéz

Open Access

https://doi.org/10.11591/ijeecs.v32.i1.pp478-493

Copy DOI

Abstract

<span>The process of identifying software bugs is of paramount importance as it ensures software reliability and facilitates maintenance activities. The quality improvement process of software relies heavily on software bug prediction (SBP). In SBP, the task of accurately identifying defective source code poses a significant challenge. Numerous of machine learning (ML) models has been developed specifically to address this challenge in SBP. Nonetheless, the class imbalance issue restricts the potential of these models to predict software bugs accurately. This issue poses a significant hindrance to the efficiency of these models, leading to imbalanced false-positive and false-negative outcomes. Previous studies have paid limited attention to addressing the challenge of class imbalance in SBP. This study aims to fill this research gap by employing a combination of two recurrent neural networks (RNNs), namely <a name="_Hlk141433135"></a>long-short-term memory (LSTM) and gated recurrent unit (GRU), along with an undersampling method (near miss) to effectively tackle this issue. Experiments have been conducted on publicly available benchmark datasets, considering both class-level and file-level metrics. The experimental results lead to the conclusion that our models outperform others and the combination of RNNs models with undersampling methods leads to improved bug prediction performance, particularly for datasets with imbalanced class distributions.</span>

Full Text