Abstract

Defect identification is an important task for ensuring the quality of software. Recently, researchers have begun to utilize artificial intelligence techniques to improve the usability of static analysis tools by automatically identifying true defects from the reported SA alarms. Existing methods mainly focus on using the static code features to represent the defective code. However, a challenge that threatens the performance of these machine learning methods is the irrelevant and redundant features. Feature selection techniques can be applied to alleviate this problem. Since many feature selection methods have been proposed, this paper conducts a rigorous experimental evaluation on the impact of feature selection techniques for defect identification and explores whether there is a smallest ratio when using the feature selection techniques for building defect identification models with acceptable performance. Additionally, this paper proposes an effective feature selection approach based on the idea of majority voting, combing the output results of different feature selection techniques. The experimental results for five open-source projects show that there is a best ratio (20%) for feature selection which achieves satisfied performance with far fewer features used for defect identification. This finding can serve as a practical guideline for software defect identification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.