Abstract

Abstract The Privacy-Preserving Data Mining (PPDM) is respectively new domain of data mining. In this mining technique, different parties combine their data with unknown parties or data from other parties for finding a standard solution of data intelligence. But the data modeling and their outcomes are varying according to different factors. In this paper, the investigation is made to found the impact of dimensions and noise over classifiers' performance in PPDM systems. In this context, first, a large dimension dataset is captured from KDD CUP for experimentation; additionally, the PCA (Principle component analysis), KPCA (kernel principle analysis), and CRC (correlation coefficient) based dimensionality reduction techniques are used for the study. On the other hand, for demonstrating the impact of noise, the first study of different noise used in the PPDM system, random noise, is used to manipulate data. During this, we found the random noise is not functional for categorical attributes. Therefore an extended controlled noise algorithm is introduced. That algorithm is used to generate a new dataset from original data without disturbing the data utility. To justify this fact, two supervised learning classifiers are implemented, namely C4.5 and CART. Additionally, five publically available datasets are used for experiments. According to the obtained results, we found that the classical random noise greatly influences classifier performance inaccuracy in an uncontrolled manner. On the other hand, controlled noise-based manipulated data is less influential for classifier performance because we found less difference between original dataset classification performance and controlled noise-based manipulated dataset classification. But the controlled noise increases the time and memory usages due to alteration of all the categorical attributes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call