Abstract

Data science is becoming more important for software engineering problems. Software defect prediction is a critical area which can help the development team allocate test resource efficiently and better understand the root cause of defects. Furthermore, it can help find the reason why a component or even a project is failure-prone. This paper deals with binary classification in predicting if a software component has a bug by using three widely used machine learning algorithms: Random Forest (RF), Neural Networks (NN), and Support Vector Machine (SVM). The paper investigates the applications of these algorithms to the challenging issue of predicting defects in software components. This paper combines code metrics and process metrics as indicators for the Eclipse environment using the aforementioned three algorithms for a sample of weekly Eclipse features. Feature reduction is also adopted using General Linear Model (GLM) to save computational time. The results confirm the predictive capabilities of using two features -- NBD_max and Pre-defects -- and are comparable to the results of using all 61 features. Additionally, this paper evaluates the performance of the three algorithms. NN and RF turn out to have the best fit.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call