Abstract

Software defect prediction models play a key role to increase the quality and reliability of software systems. Because, they are used to identify defect prone source code components and assist testing activities during the development life cycle. Prior research used supervised and unsupervised Machine Learning models for software defect prediction. Supervised defect prediction models require labeled data, however it might be time consuming and expensive to obtain labeled data that has the desired quality and volume. The unsupervised defect prediction models usually use clustering techniques to relax the labeled data requirement, however labeling detected clusters as defective is a challenging task. The Pareto principle states that a small number of modules contain most of the defects. Getting inspired from the Pareto principle, this work proposes a novel, unsupervised learning approach that is based on outlier detection. We hypothesize that defect prone software components have different characteristics when compared to others and can be considered as outliers, therefore outlier detection techniques can be used to identify them. The experiment results on 16 software projects from two publicly available datasets (PROMISE and GitHub) indicate that the k-Nearest Neighbor (KNN) outlier detection method can be used to identify the majority of software defects. It could detect 94% of expected defects at best case and more than 63% of the defects in 75% of the projects. We compare our approach with the state-of-the-art supervised and unsupervised defect prediction approaches. The results of rigorous empirical evaluations indicate that the proposed approach outperforms existing unsupervised models and achieves comparable results with the leading supervised techniques that rely on complex training and tuning algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.