Some Code Smells Have a Significant but Small Effect on Faults

Tracy Hall,David Bowes,Min Zhang,Yi Sun

doi:10.1145/2629648

Abstract

We investigate the relationship between faults and five of Fowler et al.'s least-studied smells in code: Data Clumps, Switch Statements, Speculative Generality, Message Chains, and Middle Man. We developed a tool to detect these five smells in three open-source systems: Eclipse, ArgoUML, and Apache Commons. We collected fault data from the change and fault repositories of each system. We built Negative Binomial regression models to analyse the relationships between smells and faults and report the McFadden effect size of those relationships. Our results suggest that Switch Statements had no effect on faults in any of the three systems; Message Chains increased faults in two systems; Message Chains which occurred in larger files reduced faults; Data Clumps reduced faults in Apache and Eclipse but increased faults in ArgoUML; Middle Man reduced faults only in ArgoUML, and Speculative Generality reduced faults only in Eclipse. File size alone affects faults in some systems but not in all systems. Where smells did significantly affect faults, the size of that effect was small (always under 10 percent). Our findings suggest that some smells do indicate fault-prone code in some circumstances but that the effect that these smells have on faults is small. Our findings also show that smells have different effects on different systems. We conclude that arbitrary refactoring is unlikely to significantly reduce fault-proneness and in some cases may increase fault-proneness.

Full Text