Abstract

Traditional correlated differential privacy technology usually introduces too much noise, reducing data availability. Besides, machine learning often confronts training sets of high-dimensional data, which brings heavy computing overhead. Aiming at the first issue, we design a more reasonable correlation analysis method. This method combines feature matching algorithms with information entropy-based feature importance to accurately calculate the correlated degree of records, reducing data correlation and correlated sensitivity and improving the data’s utility. It is a novel evaluation method of the correlation of records that can alleviate the limitations of traditional calculating correlation methods. Based on this method, we provide a data release solution to reduce the data dimensionality and improve the training efficiency of machine learning by combining the maximum information coefficient with differential privacy. Furthermore, we introduce an optimization algorithm based on mutual information to choose the best principal components to improve the efficiency of our data release solution. To demonstrate the proposed solution’s effectiveness and performance compared to existing schemes, we conducted experiments on three real-world datasets. The experimental results show that our scheme reduces the data correlation by up to 80% compared to existing schemes. Moreover, the accuracy of machine learning is improved by 10% to 20% for the same privacy budget.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call