M-sanit: Computing misusability score and effective sanitization of big data using Amazon elastic MapReduce

M Nagaratna,Y Sowmya

doi:10.1109/iccpeic.2017.8290334

Abstract

The invent of distributed programming frameworks like Hadoop paved way for processing voluminous data known as big data. Due to exponential growth of data, enterprises started to exploit the availability of cloud infrastructure for storing and processing big data. Insider attacks on outsourced data causes leakage of sensitive data. Therefore, it is essential to sanitize data so as to preserve privacy or non-disclosure of sensitive data. Privacy Preserving Data Publishing (PPDP) and Privacy Preserving Data Mining (PPDM) are the areas in which data sanitization plays a vital role in preserving privacy. The existing anonymization techniques for MapReduce programming can be improved to have a misusability measure for determining the level of sanitization to be applied to big data. To overcome this limitation we proposed a framework known as M-Sanit which has mechanisms to exploit misusability score of big data prior to performing sanitization using MapReduce programming paradigm. Our empirical study using the real world cloud eco system such as Amazon Elastic Cloud Compute (EC2) and Amazon Elastic MapReduce (EMR) reveals the effectiveness of misusability score based sanitization of big data prior to publishing or mining it.

Full Text