Abstract

In current software defect prediction (SDP) research, most empirical studies only use data sets provided by Promise repository and this may cause a threat for external validity. Instead of SDP data set sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers to share more models. However, sharing models directly may result in the disclosure of privacy, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to SDP model sharing and propose a novel method A-DPRF, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, this method first performs data preprocessing for the data set, such as over-sampling for minority instances (i.e., faulty modules) and discretization for continuous features. Then it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last two steps of A-DPRF use Laplace and exponential mechanisms to satisfy the requirement of DP. In our empirical studies, we choose experimental subjects from real software projects. Then we use AUC as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that A-DPRF can achieve better performance than a baseline method B-DPRF in most cases when using the same privacy budget.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.