Abstract
In defect prediction studies, open-source and real-world defect data sets are frequently used. The quality of these data sets is one of the main factors affecting the validity of defect prediction methods. One of the issues is repeated data points in defect prediction data sets. The main goal of the paper is to explore how low-level metrics are derived. This paper also presents a cleansing algorithm that removes repeated data points from defect data sets. The method was applied on 20 data sets, including five open source sets, and area under the curve (AUC) and precision performance parameters have been improved by 4.05% and 6.7%, respectively. In addition, this work discusses how static code metrics should be used in bug prediction. The study provides tips to obtain better defect prediction results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.