Abstract

Random Forest classification algorithm is a data mining algorithm with high classification accuracy. During the construction of the random forest, personal privacy may be leaked. Aiming at the problems of low prediction accuracy and unsatisfactory classification results in existing random forest classification algorithms with privacy protection, a high-accuracy differential privacy protection random forest classification algorithm (DPFMaxTree) is proposed. First, based on the existing Max attribute measurement, a new attribute measurement method F _Max is designed to improve the classification accuracy of a single decision tree; Secondly, combined with a new privacy budget allocation mechanism, the datasets with continuous attributes are discretized by the exponential mechanism using CART algorithm, then Laplace mechanism is used to add noise, and the algorithm uses the F_Max attribute measurement to construct the DPFMaxTree classification algorithm. Experimental result on real datasets shows that the new algorithm can perform classification predictions well under a reasonable privacy budget.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call