Abstract
Metric-based classification trees provide an approach for identifying user-specified classes of high-risk software components throughout the software lifecycle. Based on measurable attributes of software components and processors, this empirically guided approach derives models of problematic software components. These models, which are represented as classification trees, are used on future systems to identify components likely to share the same high-risk properties. Example high-risk component properties include being fault-prone, change-prone, or effort-prone, or containing certain types of faults. Identifying these components allows developers to focus the application of specialized techniques and tools for analyzing, testing, and constructing software. A validation study using metric data from 16 NASA systems showed that the trees had an average classification accuracy of 79.3% for fault-prone and effort-prone components in that environment. One fundamental feature of the classification tree generation algorithm is the method used for partitioning the metric data values into mutually exclusive and exhaustive ranges. This study compares the accuracy and the complexity of trees resulting from five techniques for partitioning metric data values. The techniques are quartiles, octiles, and three methods based on least weight subsequence (LWS-χ) analysis, where χ is the upper bound on the number of partitions. The LWS-3 and LWS-5 partition techniques resulted in trees with higher accuracy (in terms of completeness and consistency) than did quartiles and octiles. LWS-3 and LWS-5 trees were not statistically different in terms of accuracy, but LWS-3 trees had lower complexity than all other methods in terms of the number of unique metrics required. The trees from the three LWS methods (LWS-3, LWS-5, and LWS-8) had lower complexity than did the trees from quartiles and octiles. In general, the results indicate that distribution-sensitive partition techniques that use only relatively few partitions, such as the least weight subsequence techniques LWS-3 and LWS-5, can increase accuracy and decrease complexity in classification trees. Classification analysis techniques, along with other empirically based analysis techniques for large-scale software, will be supported in the Amadeus measurement and empirical analysis system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.