Abstract

ABSTRACT Clustering is a data mining technique that has been effectively used in the last few decades for knowledge extraction. Privacy is a major problem while releasing data for clustering and therefore privacy-preserving data mining (PPDM) algorithms have been developed. Aggregation is a popular PPDM technique that has been used. However, in the last few years, certain applications require that data mining be performed on high-dimensional data. The present privacy preservation techniques perform aggregation in a univariate manner along each dimension. This affects the utility measures, information measures, and especially retention of original clusters. This paper proposes a new technique called as subspace-based aggregation (SBA). SBA categorizes the dimensions into dense and non-dense subspaces based on the density of points. Aggregation is performed separately for dense and non-dense subspaces. This approach helps to maximize utility measures, information measures, and retention of clusters. SBA is run on high-dimensional continuous datasets from UCI Machine Learning repository. SBA is compared with related work methods such as SINGLE, SIMPLE, MDAV, and PPPCA. SBA provides an improvement of 66% in utility, 400% in cluster identification, 5% in co-variance, and standard deviation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.