HOPE: Software Defect Prediction Model Construction Method via Homomorphic Encryption

Chi Yu,Zixuan Ding,Xiang Chen

doi:10.1109/access.2021.3078265

Abstract

Software defect prediction can predict the defective modules in the project in advance, which is helpful to optimize the allocation of test resources. Recently, privacy protection for datasets and models has gradually attracted the attention of researchers. In this study, we are the first to apply homomorphic encryption to software defect prediction model construction and propose a novel method HOPE. Specifically, we adopt an algorithm approximation strategy to approximate the sigmoid function and select the Paillier homomorphic encryption algorithm for Logistical regression. In our case study, we choose the MORPH dataset gathered from real-world open-source projects as our experimental subjects. Then we design three control groups to simulate three different scenarios based on whether the client sends the encrypted data to the server and whether the server uses the HOPE method. The final results show that if the server uses the original Logistic regression to construct the model on the encrypted data, the performance of the trained model is similar to random guess, which can guarantee the privacy protection of the data. Moreover, compared with the original Logistical regression method, the method HOPE only needs a small amount of computational cost, but there is no obvious performance decrease. We share our implementation scripts and datasets to encourage researchers to conduct more studies on this research direction.

Highlights

Software defect prediction (SDP) [20] is an active research topic in the research domain of mining software repositories
We introduce the background of the homomorphic encryption algorithm and Logistic regression since our proposed method HOPE only focuses on privacy protection via homomorphic encryption and uses Logistic regression as the chosen classifier
Since the method HOPE is based on homomorphic encryption, the increase in model construction time is inevitable

Summary

Introduction

Software defect prediction (SDP) [20] is an active research topic in the research domain of mining software repositories. A defect prediction model is a statistical regression model or a machine learning classifier trained to identify defective modules. Machine learning is a common method to construct defect prediction models after mining software repositories (such as version control systems, bug tracking systems, developers’ emails) [20] [49] [26]. The constructed models can predict the defect-proneness or the defect number of new program modules in the project. A. SOFTWARE DEFECT PREDICTION Software defect prediction [30] [34] [14] [13] is one of the most active research topics in previous studies of the mining software repository domain. By using a machine learning method, defective modules can be identified in advance, and software quality assurance resources are allocated for

Objectives

Results

Conclusion