Abstract
In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high dimensional, which are challenging to different privacy preservation algorithms. However, some domains of such data sets also have some special properties which make the use of sketch based techniques particularly useful. In this paper, we present a new method for privacy preserving data mining of text and binary data with the use of a sketch based approach. The special properties of such data sets which are exploited are that of sparsity; according to this property, only a small percentage of the attributes have non-zero values. We formalize an anonymity model for the sketch based approach, and utilize it in order to construct sketch based privacy preserving representations of the original data. This representation allows accurate computation of a number of important data mining primitives such as the dot product. Therefore, it can be used for a variety of data mining algorithms such as clustering and classification. We illustrate the effectiveness of our approach on a number of real and synthetic data sets. We show that the accuracy of data mining algorithms is preserved by the transformation even in the presence of increasing data dimensionality.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.