Evaluating techniques for generating metric-based classification trees

Adam A Porter,Richard W Selby

doi:10.1016/0164-1212(90)90041-j

Abstract

Metric-based classification trees provide an approach for identifying user-specified classes of high-risk software components throughout the software lifecycle. Based on measurable attributes of software components and processors, this empirically guided approach derives models of problematic software components. These models, which are represented as classification trees, are used on future systems to identify components likely to share the same high-risk properties. Example high-risk component properties include being fault-prone, change-prone, or effort-prone, or containing certain types of faults. Identifying these components allows developers to focus the application of specialized techniques and tools for analyzing, testing, and constructing software. A validation study using metric data from 16 NASA systems showed that the trees had an average classification accuracy of 79.3% for fault-prone and effort-prone components in that environment. One fundamental feature of the classification tree generation algorithm is the method used for partitioning the metric data values into mutually exclusive and exhaustive ranges. This study compares the accuracy and the complexity of trees resulting from five techniques for partitioning metric data values. The techniques are quartiles, octiles, and three methods based on least weight subsequence (LWS-χ) analysis, where χ is the upper bound on the number of partitions. The LWS-3 and LWS-5 partition techniques resulted in trees with higher accuracy (in terms of completeness and consistency) than did quartiles and octiles. LWS-3 and LWS-5 trees were not statistically different in terms of accuracy, but LWS-3 trees had lower complexity than all other methods in terms of the number of unique metrics required. The trees from the three LWS methods (LWS-3, LWS-5, and LWS-8) had lower complexity than did the trees from quartiles and octiles. In general, the results indicate that distribution-sensitive partition techniques that use only relatively few partitions, such as the least weight subsequence techniques LWS-3 and LWS-5, can increase accuracy and decrease complexity in classification trees. Classification analysis techniques, along with other empirically based analysis techniques for large-scale software, will be supported in the Amadeus measurement and empirical analysis system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluating techniques for generating metric-based classification trees

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software

Lead the way for us

Journal: Journal of Systems and Software	Publication Date: Jul 1, 1990
Citations: 52

Similar Papers

Assertion-Based Validation of Modified Programs
Bogdan Korel ... Li Tao
-
Bogdan Korel, et. al.Bogdan Korel ... Li Tao
01 Apr 2009
01 Apr 2009

Fault analysis in solar PV arrays under: Low irradiance conditions and reverse connections
Ye Zhao ... Jean-Francois De Palma
-
Ye Zhao, et. al.Ye Zhao ... Jean-Francois De Palma
01 Jun 2011
01 Jun 2011

Metrics based classification trees for software test monitoring and management
R.A Paul
-
R.A PaulR.A Paul
06 Nov 1994
06 Nov 1994

Software component models
Kung-Kiu Lau
-
Kung-Kiu LauKung-Kiu Lau
28 May 2006
28 May 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating techniques for generating metric-based classification trees

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software