Outlier elimination in construction of software metric models

Victor K Y Chan,W Eric Wong

doi:10.1145/1244002.1244319

Abstract

Software metric models are models relating various software metrics of software projects. Such models' purpose is to predict some of these metrics for certain future projects given the other metrics for those projects. The construction of software metric models derives such relationships and is usually based on data samples of concerned software metrics for past software projects. Often, in such a data sample, there are inevitably a few very extreme projects which have relationships among their metrics deviating substantially from those among the metrics for the remaining mainstream bulk of projects in the data sample. Such outlier projects exert considerable undue influence on the derivation of the said relationships during model construction in that the relationships so derived cannot candidly reflect the true mainstream relationships. The direct consequence is degraded prediction accuracy of the constructed models for future projects. To overcome this problem, we proposed a methodology to identify and thus eliminate such outliers prior to model construction. Our methodology makes use of the least of median squares (LMS) regression to uncover such outliers and is applicable irrespective of any subsequent model construction approaches. We also did a case study to apply our methodology, and the results prove our methodology being able to improve the prediction accuracy of most models experimented with in the study. Thus, our methodology is recommended for any further software metric model construction. This paper documents such a methodology and the successful case study.

Full Text