Abstract

As software systems undergo escalating complexity, the identification of bugs and defects becomes pivotal for ensuring seamless user experiences and averting potentially costly post-release issues. This study addresses this critical need by concentrating on the application of active learning methods in code defect prediction. The investigation focuses on the efficacy of active learning combined with ensemble methods, leveraging the dynamic selection and labeling of training instances to increase model performance, reduce the demand for exhaustive labeling efforts, and enhance the effectiveness of code defect prediction systems. Various traditional and ensemble methods are deployed, employing diverse query strategies (uncertainty, margin, and entropy sampling) to assess if active variants can rival original approaches while significantly downsizing the training set. Evaluation encompasses classical classification metrics (AUC, Kappa, and MCC), supplemented by a proposed easy-to-interpret performance index that takes into consideration not only the traditional metric outcomes but also the percentage of the initial dataset utilized, aligned with the dual nature of the problem. The results, elucidated through graphical representations and statistical tests, unveil the advantage of active methods, showcasing reductions of the initial training set by at least 75% in approximately 64% of cases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.