Abstract

Defect prediction aims to estimate software reliability via learning from historical defect data. A defect prediction method identifies whether a software module is defect-prone or not according to metrics that are mined from software projects. These metric values, also known as features, may involve irrelevance and redundancy, which will hurt the performance of defect prediction methods. Existing work employs feature selection to preprocess defect data to filter out useless features. In this paper, we propose a novel feature selection framework, MICHAC, short for defect prediction via Maximal Information Coefficient with Hierarchical Agglomerative Clustering. MICHAC consists of two major stages. First, MICHAC employs maximal information coefficient to rank candidate features to filter out irrelevant ones, second, MICHAC groups features with hierarchical agglomerative clustering and selects one feature from each resulted group to remove redundant features. We evaluate our proposed method on 11 widelystudied NASA projects and four open-source AEEEM projects using three different classifiers with four performance metrics (precision, recall, F-measure, and AUC). Comparison with five existing methods demonstrates that MICHAC is effective in selecting features in defect prediction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.