Abstract

AbstractRecent years have witnessed the wide availability of an array of transactional datasets for mining and other research activities. A primary concern related to the public sharing of transactional datasets is identifying individuals whose data is being published. Data anonymization is a commonly utilized privacy preservation method for preventing user identification. However, the existing anonymization models such as ‐anonymity, ‐uncertainty, and (h, k, p)‐coherence for privacy preservation of transactional data do not provide complete protection from the various types of possible privacy attacks. Therefore, this article proposes a novel privacy model called (k, m, t)‐anonymity to effectively prevent identity and attribute disclosure as well as skewness attack on transactional data. A genetic algorithm‐based implementation of the model is also presented. The genetic algorithm clusters transactional data based on the similarity among the transactions for effective ‐anonymization with low information loss. The clustering algorithm simultaneously aims to minimize the skewness of data distribution in the obtained clusters for preventing skewness attack on anonymized data. Experimental results have verified that the (k, m, t)‐anonymity model ensures transactional data anonymization without significant information loss. The proposed privacy model is implemented using the proposed approach on two real‐world datasets (health domain and click‐stream data) and an enormous dataset generated synthetically (health domain consisting of 5,00,000 records). The relative error is less as compared to the relative privacy and disassociation technique for all test case scenarios. Hence, the proposed anonymization model maintains the data utility.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call