SOFTWARE QUALITY PREDICTION USING MACHINE LEARNING TECHNIQUES AND SOURCE CODE METRICS: A REVIEW

Santosh Saklani

doi:10.26483/ijarcs.v13i6.6918

Abstract

Software quality prediction is the Machine Learning (ML) based technique in which ML models are trained using historical data. Output from these quality models can be used by software experts in the early phase of software development for improving the quality of software by controlling the various quality attributes like maintainability, reliability, security issues of software etc. In this study a systematic review of studies from 2005 to 2021 is performed. Studies that use ML techniques and source code metrics for Software Quality Prediction (SQP) are included for review. Study assesses the commonly used machine learning techniques and source code metric for SQP. Commonly used datasets, feature selection techniques and commonly used performance measures in software quality prediction are also assessed. In this paper 53 primary studies are selected for systematic review. Results of this study prove that Bayesian Learning (BL), Regression, Ensemble Learning (EL), Decision Tree (DT) and Support Vector Machine (SVM) are most commonly ML techniques used for quality prediction which comprises 58%, 52%, 41%, 32%, and 32% of the overall studies respectively. It is also assessed that NASA, PROMISE, Apache, Mozilla Firefox and Eclipse are the most commonly used datasets for training and testing the SQP models. LOC, CC, CBO, RFC, WMC, LCOM, DIT and NOC are among the most commonly used source code metrics in SQP. Based on the results from the selected studies it is concluded that ML techniques and source code metrics have the ability to improve the overall quality of the software.

Full Text