Performance Analysis of Machine Learning Approaches in Software Complexity Prediction

Sayed Moshin Reza,Md Mahfujur Rahman,Shamim Al Mamun,Omar Badreddin,Hasnat Parvez

doi:10.1007/978-981-33-4673-4_3

Abstract

Software design is one of the core concepts in software engineering. This covers insights and intuitions of software evolution, reliability, and maintainability. Effective software design facilitates software reliability and better quality management during development which reduces software development cost. Therefore, it is required to detect and maintain these issues earlier. Class complexity is one of the ways of detecting software quality. The objective of this paper is to predict class complexity from source code metrics using machine learning (ML) approaches and compare the performance of the approaches. In order to do that, we collect ten popular and quality maintained open source repositories and extract 18 source code metrics that relate to complexity for class-level analysis. First, we apply statistical correlation to find out the source code metrics that impact most on class complexity. Second, we apply five alternative ML techniques to build complexity predictors and compare the performances. The results report that the following source code metrics: Depth inheritance tree (DIT), response for class (RFC), weighted method count (WMC), lines of code (LOC), and coupling between objects (CBO) have the most impact on class complexity. Also, we evaluate the performance of the techniques, and results show that random forest (RF) significantly improves accuracy without providing additional false negative or false positive that work as false alarms in complexity prediction.

Full Text