Abstract

Background. Outliers and corrupted data points may unduly bias software development effort estimation models. However, given the usually limited size of software engineering data sets, removing too many data points may seriously reduce the power of the statistical tests used and the likelihood of statistically significant result. Also, statistical techniques are typically based on assumptions that are either believed to be true a priori or, at best, checked via statistical tests, without ever achieving 100% certainty on their truthfulness. Estimation models based on less strict assumptions have broader applicability and lower risks of drawing unwarranted conclusions. Aim. We investigate the usefulness of Robust Regression when building effort estimation models, by varying the degree of robustness and, thus, the number of data points that are excluded from the data analysis as outliers. Method. We have used Least Quantile of Squares (LQS) Robust Regression, a generalization of the Least Median of Squares (LMS). LMS builds a regression line by minimizing the median squared residual. LQS minimizes the order statistic of square residuals corresponding to any specified quantile, and not just the median, which is the order statistic corresponding to the 50% quantile. We have extended a statistical significance test for univariate LQS regression models. We have also built a weighted model, obtained from statistically significant LQS models, where each LQS model contributes proportionally to the quantile used. Results. We have applied LQS Linear Regression to estimate development effort on four projects from the PROMISE data set and obtained valid and significant univariate models. Conclusions. LQS may provide a valid alternative to LMS and Ordinary Least Square regressions to build estimation models when (1) balancing the need for excluding outliers and keeping enough data points to build statistically significant models and (2) using less strict assumptions underlying the regression technique.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.