On the effectiveness of early life cycle defect prediction with Bayesian Nets

Norman Fenton,Paul Krause,Łukasz Radliński,Martin Neil,William Marsh,Peter Hearty

doi:10.1007/s10664-008-9072-x

Abstract

Standard practice in building models in software engineering normally involves three steps: collecting domain knowledge (previous results, expert knowledge); building a skeleton of the model based on step 1 including as yet unknown parameters; estimating the model parameters using historical data. Our experience shows that it is extremely difficult to obtain reliable data of the required granularity, or of the required volume with which we could later generalize our conclusions. Therefore, in searching for a method for building a model we cannot consider methods requiring large volumes of data. This paper discusses an experiment to develop a causal model (Bayesian net) for predicting the number of residual defects that are likely to be found during independent testing or operational usage. The approach supports (1) and (2), does not require (3), yet still makes accurate defect predictions (an R 2 of 0.93 between predicted and actual defects). Since our method does not require detailed domain knowledge it can be applied very early in the process life cycle. The model incorporates a set of quantitative and qualitative factors describing a project and its development process, which are inputs to the model. The model variables, as well as the relationships between them, were identified as part of a major collaborative project. A dataset, elicited from 31 completed software projects in the consumer electronics industry, was gathered using a questionnaire distributed to managers of recent projects. We used this dataset to validate the model by analyzing several popular evaluation measures (R 2, measures based on the relative error and Pred). The validation results also confirm the need for using the qualitative factors in the model. The dataset may be of interest to other researchers evaluating models with similar aims. Based on some typical scenarios we demonstrate how the model can be used for better decision support in operational environments. We also performed sensitivity analysis in which we identified the most influential variables on the number of residual defects. This showed that the project size, scale of distributed communication and the project complexity cause the most of variation in number of defects in our model. We make both the dataset and causal model available for research use.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the effectiveness of early life cycle defect prediction with Bayesian Nets

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Journal: Empirical Software Engineering	Publication Date: Jun 27, 2008
Citations: 153

Similar Papers

Causal Artificial Intelligence Models of Food Quality Data.
Želimir Kurtanjek
Food Technology and Biotechnology | VOL. 62
Želimir KurtanjekŽelimir Kurtanjek
07 Jan 2024
Food Technology and Biotechnology | VOL. 62

Revolutionizing steel building project cost overrun risk assessment by Bayesian network
Sou-Sen Leu ... Cathy Chang-Wei Hung
Engineering, Construction and Architectural Management | VOL. -
Sou-Sen Leu, et. al.Sou-Sen Leu ... Cathy Chang-Wei Hung
16 Jun 2023
Engineering, Construction and Architectural Management | VOL. -

Project Data Incorporating Qualitative Factors for Improved Software Defect Prediction
N Fenton ... L Radlinski
-
N Fenton, et. al.N Fenton ... L Radlinski
01 May 2007
01 May 2007

Neural Network Approach for Software Defect Prediction Based on Quantitative and Qualitative Factors
Parvinder S Sandhu ... Dalveer Kaur Grewal
International Journal of Computer Theory and Engineering | VOL. -
Parvinder S Sandhu, et. al.Parvinder S Sandhu ... Dalveer Kaur Grewal
01 Jan 2012
International Journal of Computer Theory and Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the effectiveness of early life cycle defect prediction with Bayesian Nets

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering