Abstract

Abstract: In order to build a prediction model we must label a large amount of data in order to mine software repositories. The accuracy of the labels has a significant impact on a model's performance. However, there have been few research that have looked into the influence on a prediction model, there are occurrences that have been mislabeled.. To close the gap, we conduct a research project on how to report a security bug (SBR)prediction in this paper. Furthermore, it has the potential to mislead SBR prediction research. We first enhance the label validity of these 5 datasets by personally evaluating each and every bugcomplaint in this study, and we discover 749 SBRs that were previously Non-SBRs have been mislabeled(NSBRs). We then examine the performance of the classification models both on messy (before the alteration) and cleaner (after our reconfiguration) datasets , impact of dataset label correctness. The results suggest that cleaning the datasets improves the performance of classification models. Index Terms: Prediction of security bug reports, data quality, software detection and report

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.