ContextInsufficient code understandability makes software difficult to inspect and maintain and is a primary cause of software development cost. Several source code measures may be used to identify difficult-to-understand code, including well-known ones such as Lines of Code and McCabe’s Cyclomatic Complexity, and novel ones, such as Cognitive Complexity.ObjectiveWe investigate whether and to what extent source code measures, individually or together, are correlated with code understandability.MethodWe carried out an empirical study with students who were asked to carry out realistic maintenance tasks on methods from real-life Open Source Software projects. We collected several data items, including the time needed to correctly complete the maintenance tasks, which we used to quantify method understandability. We investigated the presence of correlations between the collected code measures and code understandability by using several Machine Learning techniques.ResultsWe obtained models of code understandability using one or two code measures. However, the obtained models are not very accurate, the average prediction error being around 30%.ConclusionsBased on our empirical study, it does not appear possible to build an understandability model based on structural code measures alone. Specifically, even the newly introduced Cognitive Complexity measure does not seem able to fulfill the promise of providing substantial improvements over existing measures, at least as far as code understandability prediction is concerned. It seems that, to obtain models of code understandability of acceptable accuracy, process measures should be used, possibly together with new source code measures that are better related to code understandability.
Read full abstract