Anonymization Based Fisher–Yates Shuffle Method for Streaming of Twitter Data

doi:10.35940/ijrte.b1397.0882s819

Abstract

In this era of Big Data, many organizations are functioning with personal data, that has to be preserved for privacy reason. There are hazards to identify the individual details by using Quasi Identifier (QI). So to preserve the privacy, anonymization points us to convert the personal data into unidentified personal data. There are many organizations that produce the large data in real time. With the help of Hadoop components like HDFS and MapReduce and with its ecosystems, large volume of data can be processed in real time. There are many basic data anonymization techniques like cryptographic, substitution, character masking, shuffling, nulling out, date variance and number variance. Here privacy preservation is achieved for streaming data by using one of the anonymization techniques called ‘shuffling’ with Big data concept. K-anonymity, t-closeness, l-diversity are usually used technique for privacy concern in a data. But in all these techniques information loss and data utility are not preserved very well. Dynamically Anonymizing Data Shuffling (DADS) technique is used to overcome this information loss and also to improve data utility in streaming data.

Full Text