Abstract

Metric-driven classification models identify software components with user-specifiable properties, such as those likely to be fault-prone, have high development effort, or have faults in a certain class. These models are generated automatically from past metric data, and they are scalable to large systems and calibratable to different projects. These models serve as extensible integration frameworks for software metrics because they allow the addition of new metrics and integrate symbolic and numeric data from all four measurement abstractions. In our past work, we developed and evaluated techniques for generating tree-based classification models. In this paper, we investigate a technique for generating network-based classification models. The principle underlying the tree-based models is partitioning, while the principle underlying the network-based models is pattern matching. Tree-based models prune away information and can be decomposed, while network-based models retain all information and tend to be more complex. We evaluate the predictive accuracy of network-based models and compare them to the tree-based models.The evaluative study uses metric data from 16 NASA production systems ranging in size from 3000 to 112,000 source lines. The goal of the classification models is to identify the software components in the systems that had “high” development faults or effort, where “high” is defined to be in the uppermost quartile relative to past data. The models are derived from 74 candidate metrics that capture a multiplicity of information about the components: development effort, faults, changes, design style, and implementation style. A total of 1920 tree- and network-based models are automatically generated, and their predictive accuracies are compared in terms of correctness, completeness, and consistency using a non-parametric analysis of variance model. On the average, the predictions from the network-based models had 89.6% correctness, 69.1% completeness, and 79.5% consistency, while those from the tree-based models had 82.2% correctness, 56.3% completeness, and 74.5% consistency. The network-based models had statistically higher correctness and completeness than did the tree-based models, but they were not different statistically in terms of consistency. Capabilities to generate metric-driven classification models will be supported in the Amadeus measurement-driven analysis and feedback system.KeywordsClassification ModelTarget ClassSoftware MetricsAverage Classification AccuracyTraining ProjectThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.