Abstract

AbstractRecent years have witnessed the wide availability of an array of transactional datasets for mining and other research activities. A primary concern related to the public sharing of transactional datasets is identifying individuals whose data is being published. Data anonymization is a commonly utilized privacy preservation method for preventing user identification. However, the existing anonymization models such as ‐anonymity, ‐uncertainty, and (h, k, p)‐coherence for privacy preservation of transactional data do not provide complete protection from the various types of possible privacy attacks. Therefore, this article proposes a novel privacy model called (k, m, t)‐anonymity to effectively prevent identity and attribute disclosure as well as skewness attack on transactional data. A genetic algorithm‐based implementation of the model is also presented. The genetic algorithm clusters transactional data based on the similarity among the transactions for effective ‐anonymization with low information loss. The clustering algorithm simultaneously aims to minimize the skewness of data distribution in the obtained clusters for preventing skewness attack on anonymized data. Experimental results have verified that the (k, m, t)‐anonymity model ensures transactional data anonymization without significant information loss. The proposed privacy model is implemented using the proposed approach on two real‐world datasets (health domain and click‐stream data) and an enormous dataset generated synthetically (health domain consisting of 5,00,000 records). The relative error is less as compared to the relative privacy and disassociation technique for all test case scenarios. Hence, the proposed anonymization model maintains the data utility.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.