An application of a rule-based model in software quality classification

Taghi M Khoshgoftaar,Lofton A Bullard,Kehan Gao

doi:10.1109/icmla.2007.69

Abstract

A new rule-based classification model (RBCM) and rulebased model selection technique are presented. The RBCM utilizes rough set theory to significantly reduce the number of attributes, discretation to partition the domain of attribute values, and Boolean predicates to generate the decision rules that comprise the model. When the domain values of an attribute are continuous and relatively large, rough set theory requires that they be discretized. The subsequent discretized domain must have the same characteristics as the original domain values. However, this can lead to a large number of partitions of the attribute's domain space, which in turn leads to large rule sets. These rule sets tend to form models that over-fit. To address this issue, the proposed rule-based model adopts a new model selection strategy that minimizes over-fitting for the RBCM. Empirical validation of the RBCM is accomplished through a case study on a large legacy telecommunications system. The results demonstrate that the proposed RBCM and the model selection strategy are effective in identifying the classification model that minimizes over-fitting and high cost classification errors. Keywords: rule-based classification model, rough set, reducts, discretization, software quality classification

Full Text