Abstract
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one can classify software modules into fault-prone and non-fault-prone ones. To make such a classification, we investigated into 17 classifier methods whose features (attributes) are software metrics (39 metrics) and instances (software modules) of mining are instances of 13 datasets reported by NASA. However, there are two important issues influencing our prediction accuracy when we use data mining methods: (1) selecting the best/most influent features (i.e. software metrics) when there is a wide diversity of them and (2) instance sampling in order to balance the imbalanced instances of mining; we have two imbalanced classes when the classifier biases towards the majority class. Based on the feature selection and instance sampling, we considered 4 scenarios in appraisal of 17 classifier methods to predict software fault-prone modules. To select features, we used Correlation-based Feature Selection (CFS) and to sample instances we did Synthetic Minority Oversampling Technique (SMOTE). Empirical results showed that suitable sampling software modules significantly influences on accuracy of predicting software reliability but metric selection has not considerable effect on the prediction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.