Abstract

High-utility pattern mining is an effective technique that extracts significant information from varied types of databases. However, the analysis of data with sensitive private information may cause privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, privacy-preserving utility mining (PPUM) has become an important research topic in recent years. The MSICF algorithm is a sanitization algorithm for PPUM. It selects the item based on the conflict count and identifies the victim transaction based on the concept of utility. Although MSICF is effective, the heuristic selection strategy can be improved to obtain a lower ratio of side effects. In our paper, we propose an improved sanitization approach named the Improved Maximum Sensitive Itemsets Conflict First Algorithm (IMSICF) to address this issue. It dynamically calculates conflict counts of sensitive items in the sanitization process. In addition, IMSICF chooses the transaction with the minimum number of nonsensitive itemsets and the maximum utility in a sensitive itemset for modification. Extensive experiments have been conducted on various datasets to evaluate the effectiveness of our proposed algorithm. The results show that IMSICF outperforms other state-of-the-art algorithms in terms of minimizing side effects on nonsensitive information. Moreover, the influence of correlation among itemsets on various sanitization algorithms’ performance is observed.

Highlights

  • Data mining is used to discover the decision-making knowledge and information from massive data [1,2,3,4]

  • A sanitization approach named the Improved Maximum Sensitive Itemsets Conflict First Algorithm (IMSICF) is presented in detail. e victim item is selected based on the conflict count, which is calculated dynamically

  • An improved sanitization algorithm called IMSICF is proposed for privacy-preserving utility mining

Read more

Summary

Introduction

Data mining is used to discover the decision-making knowledge and information from massive data [1,2,3,4]. Data are shared among different companies for mutual benefits This brings the risk of disclosing sensitive knowledge contained in a database [5]. Atallah et al [10] first proved that the optimal sensitive-knowledge-hiding problem is NP-hard and proposed a sanitization algorithm based on heuristic strategy. Us, the damage to nonsensitive knowledge is serious, and database quality is low when a database is modified To address this problem, we propose an improved approach called the Improved Minimum Sensitive Itemsets Conflict First Algorithm (IMSICF) for hiding sensitive high-utility itemsets. Mathematical Problems in Engineering and having the maximal utility of a sensitive itemset is chosen to be modified, which effectively reduces undesired side effects produced by the sanitization process.

Related Works
Preliminaries
The Hiding Approach
Experimental Analysis
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call