Abstract

One of the software engineering interests is quality assurance activities such as testing, verification and validation, fault tolerance and fault prediction. When any company does not have sufficient budget and time for testing the entire application, a project manager can use some fault prediction algorithms to identify the parts of the system that are more defect prone. There are so many prediction approaches in the field of software engineering such as test effort, security and cost prediction. Since most of them do not have a stable model, software fault prediction has been studied in this paper based on different machine learning techniques such as decision trees, decision tables, random forest, neural network, Naïve Bayes and distinctive classifiers of artificial immune systems (AISs) such as artificial immune recognition system, CLONALG and Immunos. We use four public NASA datasets to perform our experiment. These datasets are different in size and number of defective data. Distinct parameters such as method-level metrics and two feature selection approaches which are principal component analysis and correlation based feature selection are used to evaluate the finest performance among the others. According to this study, random forest provides the best prediction performance for large data sets and Naïve Bayes is a trustable algorithm for small data sets even when one of the feature selection techniques is applied. Immunos99 performs well among AIS classifiers when feature selection technique is applied, and AIRSParallel performs better without any feature selection techniques. The performance evaluation has been done based on three different metrics such as area under receiver operating characteristic curve, probability of detection and probability of false alarm. These three evaluation metrics could give the reliable prediction criteria together.

Highlights

  • As today’s software grows rapidly in size and complexity, the prediction of software reliability plays a crucial role in software development process [1]

  • We identified fault prediction algorithms based on different machine learning classifiers and distinct feature selection techniques

  • This study shows that applying different feature selection techniques does not have that much effect on the results; they mainly reduce the execution time

Read more

Summary

Introduction

As today’s software grows rapidly in size and complexity, the prediction of software reliability plays a crucial role in software development process [1]. Research questions are listed as follows: RQ1: which of the machine learning algorithms performs best on small and large datasets when 21-method-level metrics is used?. RQ2: which of the AIS algorithms performs best on small and large datasets when 21-method-level metrics is used?. RQ3: which of the machine learning algorithms performs best on small and large datasets when 37-method-level metrics is used?. RQ4: which of the machine learning algorithms performs best on small and large datasets when PCA and CFS applied on 21-method-level metrics?. RQ5: which of the AIS algorithms performs best on small and large datasets when PCA and CFS applied on 21method-level metrics?. In experiment 4, to answer the last question, we doubled the defect rate of CM1 dataset to see whether it has any effect on the prediction model performances or not.

Related works
Artificial immune system
CLONALG
Immunos81
Feature selection
Principal component analysis
Correlation-based feature selection
Dataset selection
Variable selection
Performance measurements criteria
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Findings
Summary and conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call