Fault-Prone Module Prediction Approaches Using Identifiers in Source Code

Osamu Mizuno,Kimiaki Kawamoto,Naoki Kawashima

doi:10.4018/ijsi.2015010103

Abstract

Prediction of fault-prone modules is an important area of software engineering. The authors assumed that the occurrence of faults is related to the semantics in the source code modules. Semantics in a software module can be extracted from identifiers in the module. Identifiers such as variable names and function names in source code are thus essential information to understand code. The naming for identifiers affects on code understandability; thus, the authors expect that they affect software quality. In this study, the authors examine the relationship between the length of identifiers and existence of software faults in a software module. Furthermore, the authors analyze the relationship between occurrence of “words” in identifiers and the existence of faults. From the experiments using the data from open source software, the authors modeled the relationship between the fault occurrence and the length of identifiers, and the relationship between the fault occurrence and the word in identifiers by the random forest technique. The result of the experiment showed that the length of identifiers can predict the fault-proneness of the software modules. Also, the result showed that the word occurrence model is as good a measure as traditional CK and LOC metrics models.

Full Text