Abstract

BACKGROUND: Fault data is vital to predicting the fault-proneness in large systems. Predicting faulty classes helps in allocating the appropriate testing resources for future releases. However, current fault data face challenges such as unlabeled instances and data imbalance. These challenges degrade the performance of the prediction models. Data imbalance happens because the majority of classes are labeled as not faulty whereas the minority of classes are labeled as faulty. AIM: The research proposes to improve fault prediction using software metrics in combination with threshold values. Statistical techniques are proposed to improve the quality of the datasets and therefore the quality of the fault prediction. METHOD: Threshold values of object-oriented metrics are used to label classes as faulty to improve the fault prediction models The resulting datasets are used to build prediction models using five machine learning techniques. The use of threshold values is validated on ten large object-oriented systems. RESULTS: The models are built for the datasets with and without the use of thresholds. The combination of thresholds with machine learning has improved the fault prediction models significantly for the five classifiers. CONCLUSION: Threshold values can be used to label software classes as fault-prone and can be used to improve machine learners in predicting the fault-prone classes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call