Abstract

The Digital era marked by the unrivalled growth of Internet and its services with day-to-day technological advancements has paved way for a data driven society. This digital explosion offers opportunities for extracting valuable information from collected data, which are used by organizations and research establishments for synergistic advantage. However, privacy of online divulged data is an issue that gets overlooked as a consequence of such large-scale analytics. Although, privacy and security practices conjointly determine the ethics of data collection and its use, personal data of individuals is largely at risk of disclosure. Considerable research has gone into privacy preserving analytics, in the light of Big Data and IoT boom, but scalable and efficient techniques, that do not compromise the usefulness of privacy constrained data, continues to be a challenging arena for research. The proposed work makes use of a distance-based perturbation method to group data and further randomizes data. The efficacy of perturbed data is evaluated for classification task that gives results on par with the non-perturbed counterpart. The relative performance of the algorithm is also evaluated on the parallel computing platform Spark. Results show that the technique does not hinder the use of data for holistic analysis while privacy is subjectively maintained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call