Abstract

<p>Detecting software defects before they occur is crucial in software engineering as it impacts software system quality and reliability. Previous studies on predicting software defects have typically employed software features, such as code size, complexity, coupling, cohesion, inheritance, and other software metrics., to forecast whether a code file or commit is prone to defects in the future. However, it is advantageous to restrict the number of features employed in a defect prediction model to avoid the challenges associated with multicollinearity and the “curse of dimensionality” and to simplify the data analysis process. By using a reduced number of features, the defect prediction model can concentrate on the most significant variables and improve its accuracy. This research paper investigates the impact of eight feature selection methods on the accuracy and stability of six supervised learning models. This study is novel as it is based on exhaustive experimentation of each of the eight feature selection techniques with each of the six supervised learning models. Two notable findings have been obtained. First, we discovered that the association and coherence-based techniques have demonstrated the highest level of accuracy when it comes to defect prediction. The models that utilized these selected features outperformed those using the original features. Second, the feature selection techniques, namely Correlation feature selection, Recursive feature elimination, and Ridge feature selection when combined with the Support vector machine and Decision tree classifier, consistently selected low-variance features across multiple supervised defect prediction models. When combined with different classifiers, these techniques achieved exceptional performance on the publicly available NASA datasets CM1 and PC2. The findings revealed a remarkable accuracy rate of over 85% for CM1 and 95% for PC2, accompanied by precision, recall, and f-measure values exceeding 95%. These exceptional results indicate the achievement of the highest level of performance in the evaluation.</p>

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.