Abstract

Cloud computing plays an essential role as a source for outsourcing data to perform mining operations or other data processing, especially for data owners who do not have sufficient resources or experience to execute data mining techniques. However, the privacy of outsourced data is a serious concern. Most data owners are using anonymization-based techniques to prevent identity and attribute disclosures to avoid privacy leakage before outsourced data for mining over the cloud. In addition, data collection and dissemination in a resource-limited network such as sensor cloud require efficient methods to reduce privacy leakage. The main issue that caused identity disclosure is quasi-identifier (QID) linking. But most researchers of anonymization methods ignore the identification of proper QIDs. This reduces the validity of the used anonymization methods and may thus lead to a failure of the anonymity process. This paper introduces a new quasi-identifier recognition algorithm that reduces identity disclosure which resulted from QID linking. The proposed algorithm is comprised of two main stages: (1) attribute classification (or QID recognition) and (2) QID dimension identification. The algorithm works based on the reidentification of risk rate for all attributes and the dimension of QIDs where it determines the proper QIDs and their suitable dimensions. The proposed algorithm was tested on a real dataset. The results demonstrated that the proposed algorithm significantly reduces privacy leakage and maintains the data utility compared to recent related algorithms.

Highlights

  • In the modern information age, many companies are using external sources of data for processing, storing, or obtaining some services such as data mining

  • Accurate identification of QIDs is an important issue for the success and validity methods of privacy-preserving outsourced data that seek to avoid privacy leakage caused by QID linking

  • This paper is aimed at classifying dataset attributes before the anonymization process and determining the proper QIDs that should be involved in the anonymity operation

Read more

Summary

Introduction

In the modern information age, many companies are using external sources of data for processing, storing, or obtaining some services such as data mining. Bampoulidis and others [7] assume that some QIDs are more important than others (i.e., in data mining/analysis) and, should be distorted as little as possible in the anonymization process. They present a tool to address the issue of QIDs by utilizing a local recoding algorithm for k-anonymity. Kaur and Agrawal [10] study the impact of QIDs on the anonymization process They gave new ways to consider before choosing the quasi-identifiers. It is good to take into account these observations before starting the anonymity process, it should be noted that these observations extracted by the study are not fixed and may change from one dataset to another

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call