Effects of measurements on correlations of software code metrics

Md Abdullah Al Mamun,Jörgen Hansson,Christian Berger

doi:10.1007/s10664-019-09714-9

Md Abdullah Al Mamun, Jörgen Hansson + Show 1 more

Open Access

https://doi.org/10.1007/s10664-019-09714-9

Copy DOI

Abstract

ContextSoftware metrics play a significant role in many areas in the life-cycle of software including forecasting defects and foretelling stories regarding maintenance, cost, etc. through predictive analysis. Many studies have found code metrics correlated to each other at such a high level that such correlated code metrics are considered redundant, which implies it is enough to keep track of a single metric from a list of highly correlated metrics.ObjectiveSoftware is developed incrementally over a period. Traditionally, code metrics are measured cumulatively as cumulative sum or running sum. When a code metric is measured based on the values from individual revisions or commits without consolidating values from past revisions, indicating the natural development of software, this study identifies such a type of measure as organic. Density and average are two other ways of measuring metrics. This empirical study focuses on whether measurement types influence correlations of code metrics.MethodTo investigate the objective, this empirical study has collected 24 code metrics classified into four categories, according to the measurement types of the metrics, from 11,874 software revisions (i.e., commits) of 21 open source projects from eight well-known organizations. Kendall’s τ-B is used for computing correlations. To determine whether there is a significant difference between cumulative and organic metrics, Mann-Whitney U test, Wilcoxon signed rank test, and paired-samples sign test are performed.ResultsThe cumulative metrics are found to be highly correlated to each other with an average coefficient of 0.79. For corresponding organic metrics, it is 0.49. When individual correlation coefficients between these two measure types are compared, correlations between organic metrics are found to be significantly lower (with p < 0.01) than cumulative metrics. Our results indicate that the cumulative nature of metrics makes them highly correlated, implying cumulative measurement is a major source of collinearity between cumulative metrics. Another interesting observation is that correlations between metrics from different categories are weak.ConclusionsResults of this study reveal that measurement types may have a significant impact on the correlations of code metrics and that transforming metrics into a different type can give us metrics with low collinearity. These findings provide us a simple understanding how feature transformation to a different measurement type can produce new non-collinear input features for predictive models.

Highlights

The exponential growth of software size (Deshpande and Riehle 2008) is bringing in many challenges related to maintainability, release planning, and other software qualities
Results of this study reveal that measurement types may have a significant impact on the correlations of code metrics and that transforming metrics into a different type can give us metrics with low collinearity
More research can be done to be certain. This empirical research investigates whether measurement types of software code metrics have an effect on their correlations

Summary

Introduction

The exponential growth of software size (Deshpande and Riehle 2008) is bringing in many challenges related to maintainability, release planning, and other software qualities. Growing software size and complexity have made it increasingly difficult to select features to be implemented in the product release and have challenged existing assumptions and approaches for release planning (Jantunen et al 2011). Validating software metrics has gained importance as predicting external software qualities are becoming more demanding day by day to be able to manage future revisions of software. Since prediction models are often multivariate, i.e., use more than one independent feature or metric, it is important that there is no significant collinearity among the independent features. Collinearity results in two major problems (Meloun et al 2002) It makes a model less useful as individual effects of the independent features on a dependent feature can no longer be isolated. El Emam and Schneidewind (2000) and Dormann et al (2013) suggested diagnosing collinearity among the independent features for a proper interpretation of regression models

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Empirical Software Engineering	Publication Date: May 16, 2019
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

Effects of measurements on correlations of software code metrics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Similar Papers

An Empirical Study of Comparison of Code Metric Aggregation Methods and Software Reliability Evaluation
Zekun Song ... Pengyang Zong
-
Zekun Song, et. al.Zekun Song ... Pengyang Zong
01 Jan 2020
01 Jan 2020

Correlations of software code metrics
Md Abdullah Al Mamun ... Christian Berger
-
Md Abdullah Al Mamun, et. al.Md Abdullah Al Mamun ... Christian Berger
25 Oct 2017
25 Oct 2017

An Empirical Study of Comparison of Code Metric Aggregation Methods–on Embedded Software
Zekun Song ... Zhiwei Ren
-
Zekun Song, et. al.Zekun Song ... Zhiwei Ren
01 Jul 2019
01 Jul 2019

Towards a software defect proneness model: feature selection
Vitaliy S Yakovyna ... Ivan I Symets
Applied Aspects of Information Technology | VOL. 4
Vitaliy S Yakovyna, et. al.Vitaliy S Yakovyna ... Ivan I Symets
21 Dec 2021
Applied Aspects of Information Technology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effects of measurements on correlations of software code metrics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Empirical Software Engineering