Abstract
In recent years, in the field of software defect prediction, researchers have proposed the Just-in-Time defect prediction technology, which can predict whether there are defects in each code change submitted by developers. This method is instant and easy to trace. However, the accuracy of Just-in-Time defect prediction is affected by the imbalance of data set categories. 20% of the defects in the software engineering field may exist in 80% of the modules. In most cases, code changes that do not cause defects account for a larger proportion. Therefore, there is an imbalance rate in the data set, that is, the imbalance between the minority and majority categories, which will affect the classification prediction effect of the model. Most types, that is, code changes that will not produce defects will make the model have an artificially high prediction accuracy, and it is difficult to obtain the expected results in practical applications. Moreover, the data set features contain many irrelevant features and redundant features, which will also increase the complexity of the prediction model. In order to improve the prediction efficiency of just in time defect prediction. Improve the interpretability and transparency of the model and establish the trust relationship between users and decision-making model. For this reason, we have established a RandomForest defect prediction model, using multiple different types of change features to study 6 open source projects from different fields. The model is explained to a certain extent using LIME interpretability technology . Using interpretability methods to extract features and trying to reduce the developer’s workload as much as possible. Our research results show that through the interpretability of the defect prediction model and identifying key features, 45% of the original workload can be used, and 96% of the original work effect can be achieved.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.