DPRF: A Differential Privacy Protection Random Forest

Jun Hou,Zhen Ni,Shunmei Meng,Yaozong Liu,Yini Chen,Qianmu Li

doi:10.1109/access.2019.2939891

Abstract

Providing privacy protection for classification algorithms has become a research hotspot in current data mining. In this paper, differential privacy is applied to the random forest classification algorithm, and a random forest algorithm based on differential privacy is proposed to protect the privacy information in the data classification process. Firstly, differential privacy provides privacy protection by adding perturbation noise, which leads to a decrease in the classification accuracy of random forest algorithms. In order to reduce the impact of differential privacy protection on the accuracy of random forest classification, a hybrid decision tree algorithm is proposed. For the construction of a single decision tree in a random forest, the information gain ratio in the ID3 algorithm and the information gain ratio in the C4.5 are combined to generate a new attribute metric IG_GR to improve the classification accuracy of a single decision tree. Secondly, a new privacy budget allocation strategy is proposed. For nodes of different depths in the decision tree, the privacy budget is allocated to its counting query and attribute query by weight, which is used to balance the signal-to-noise ratio of differential privacy technology to nodes of different depths in the decision tree. At the same time, the hybrid decision tree algorithm is applied to the construction of random forest, which balances the privacy and classification accuracy of the random forest algorithm based on differential privacy. Finally, this paper conducted experiments on UCI's Adult and Mushroom datasets. The results show that compared with the traditional decision tree algorithm, the algorithm proposed in this paper has better classification accuracy; and the DPRF can provide effective privacy protection under the premise of ensuring high classification performance. The work of this paper achieves a balance between privacy and classification accuracy, and has practical application value.

Highlights

With the rapid development of Internet technology, in addition to the government, many companies have a huge amount of data about citizens’ personal information
This paper proposes Differential Privacy Random Forest (DPRF), a new random forest classification algorithm based on differential privacy protection
Proof: According to the strategy of dividing the privacy budget by weight of each layer of the decision tree, the weight of the privacy protection budget assigned to the first layer is w1 = 2 dm, and the actual privacy protection budget obtained by the root node according to the unit share of the privacy protection budget is e1 = eu ∗ (2 dm); the privacy protection budget corresponding to the Algorithm 2 Differential Privacy Random Forest (DPRF)

Summary

INTRODUCTION

With the rapid development of Internet technology, in addition to the government, many companies have a huge amount of data about citizens’ personal information. The SuLQ-based ID3 algorithm uses the Laplace noise mechanism to add noise to construct the decision tree each time the information gain of the dataset attribute is calculated [19], [20] This causes a problem of excessive noise introduction, which is 30% lower than that of the ID3 algorithm without differential privacy protection [21]. The algorithm does not need to preprocess the dataset, that is, it does not need to discretize the continuous attributes before constructing the decision tree It extends the monotonous privacy budget allocation strategy that DiffPRF algorithm can only deal with discrete attributes, and uses an exponential mechanism to select classification points for continuous attributes [30], [31].

HYBRID DECISION TREE

EXPERIMENTS AND ANALYSIS

EXPERIMENT 1

EXPERIMENT 2

EXPERIMENT 3

EXPERIMENT 4

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

DPRF: A Differential Privacy Protection Random Forest

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Differential Privacy Random Forest Method of Privacy Protection in Cloud
Huaqiu Long ... Fei Ling
-
Huaqiu Long, et. al.Huaqiu Long ... Fei Ling
01 Aug 2019
01 Aug 2019

A Decision Tree Based on Differential Privacy
Daozhu Sun ... Shudan Yang
-
Daozhu Sun, et. al.Daozhu Sun ... Shudan Yang
15 Oct 2021
15 Oct 2021

Adaptive Differential Privacy Budget Allocation Algorithm Based on Random Forest
Si-Yang Chen ... Xin-Cheng Li
-
Si-Yang Chen, et. al.Si-Yang Chen ... Xin-Cheng Li
01 Jan 2021
01 Jan 2021

A Differential Privacy Budget Allocation Algorithm Based on Out-of-Bag Estimation in Random Forest
Dong Zheng ... Xin Li
Mathematics | VOL. 10
Dong Zheng, et. al.Dong Zheng ... Xin Li
18 Nov 2022
Mathematics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DPRF: A Differential Privacy Protection Random Forest

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access